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Neural  networks  have  successfully  transitioned  from  an  academic  interest  into  a 
viable  technology  which  is  now  being  used  in  everyday  products.  To  date,  neural  net- 
works have  been  predominantly  applied  to  forecasting  or  modeling  applications.  Based  on 
their  success  in  such  applications,  there  has  been  significant  interest  in  using  neural  net- 
works in  control  applications,  creating  a  new  field  called  neurocontrol.  Although  there 
have  been  significant  advances  in  the  theory  of  neurocontrol,  there  are  very  few  successful 
commercial  applications  using  neurocontrollers.  Conmiercial  applications  often  provide 
the  most  challenging  problems  because  the  controllers  are  required  to  function  robustly  in 
complex  and  unknown  environments.  Real-world  processes  are  complex  and  difficult  to 
control  because  they  contain  a  large  number  of  highly  interdependent  variables,  have 
highly  nonlinear  responses  to  these  variables,  and  change  their  response  over  time. 

This  work  identified  two  significant  reasons  why  neurocontrol  designs  fail  in  real- 
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world  applications:  First,  the  controllable  parameters  over  most  industrial  processes 
highly  correlated,  often  not  for  physical  reasons  but  because  of  our  process  control  strate- 
gies. Second,  intermediate  process  states  that  affect  the  process  output,  which  are  also 
affected  by  the  controllable  parameters,  have  a  significant  impact  on  controller  perfor- 
mance. When  the  controller  changes  the  controllable  parameters,  the  impact  that  this  has 
on  the  process  states,  which  will  in  turn  affect  the  process  output,  is  not  accounted  for  in 
most  neurocontrol  designs  in  the  literature. 

This  dissertation  advances  the  field  of  neurocontrol  by  providing  the  following 
solutions:  first,  the  use  of  statistical  significance  tesfing  on  the  local  linearized  relation- 
ships extracted  from  nonlinear  neural  network  models  to  avoid  problems  with  correlated 
controllable  parameters;  second,  augmenting  neurocontrol  designs  to  incorporate  depen- 
dent state  models.  These  enhancements  have  been  applied  to  four  distinct  neurocontrol 
architectures.  The  new  control  architectures  have  been  applied  to  the  novel  application  of 
controlling  NOx  emission  from  an  oil  and  gas-fired  electric  power  plant. 
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CHAPTER  1 
INTRODUCTION 

Neural  networks  have  found  an  applications  niche  as  robust  predictors  that  have  dem- 
onstrated the  ability  to  out-forecast  more  traditional  methods  in  complex  real-world  appli- 
cations. The  vast  majority  of  neural  network  applications  to  date  rely  solely  on  the 
model's  ability  to  forecast  without  regard  for  what  the  model  has  learned  about  the  rela- 
tionships within  the  underlying  process  or  the  ability  to  affect  it.  Soulie  and  Gallinari  [59] 
recently  compiled  53  industrial  applications  of  neural  networks,  of  which  only  4  make  any 
attempt  to  make  inferences  about  the  underlying  process  or  to  control  it. 

This  ratio  does  not  apply  to  theoretical  publications  in  the  literature,  however.  The 
research  into  neural  network-based  control  has  recently  received  widespread  attention, 
coined  neurocontrol  [40].  Many  authors  have  presented  the  abstract  concepts  behind  neu- 
rocontrol  [40][43][31][37],  but  the  literature  contains  a  disproportionately  low  number  of 
papers  presenting  real-world  neurocontrol  applications.  The  vast  majority  of  these  papers 
have  developed  controllers  for  fabricated,  simulated,  or  laboratory  controlled  processes. 

1.1  Proposed  Work 

The  goal  of  this  project  is  to  develop  robust  neurocontrol  design  methodologies  for 

complex  industrial  process  control  applications.  Industrial  control  applications  are  charac- 
terized by  nonlinear,  noisy,  non-Gaussian,  highly  correlated,  and  nonstationary  processes. 
This  is  an  applications  area  where  classical  control  designs  have  proven  ineffective. 
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1.1.1  Case  Study 

As  a  case  study,  this  work  develops  on-line  advisory  neurocontrollers  designed  to  min- 
imize the  NOx  emissions  for  an  oil  and  gas  co-fired  power  plant.  The  combustion  of  fossil 
fuels  inside  a  large-scale  boiler  is  a  highly  complex  process;  this  complexity  is  a  direct 
function  of  the  boiler  size.  A  typical  electric  power  boiler  maintains  a  "fireball"  which  is  3 
to  5  stories  tall,  and  there  are  hundreds  of  parameters  which  affect  the  injection  of  fuel  and 
air  at  different  locations  within  the  furnace.  The  problem  is  our  lack  of  understanding 
about  how  these  combustion  parameters  affect  NOx  formation.  This  multivariate  optimi- 
zation problem  requires  a  technology  that  can  look  at  the  process  globally  and  determine 
the  appropriate  combination  of  combustion  controls 

The  neurocontrollers  developed  in  this  study  will  forward  control  setpoints  to  opera- 
tors through  the  plant's  existing  distributed  control  system  (DCS).  The  neurocontrollers 
will  be  required  to  provide  setpoints  that  minimize  NOx  emissions,  while  maintaining  unit 
operating  constraints.  A  demonstration  system  has  been  completed  at  Canal  Electric's 
580-MW  tangentially  fired  Unit  2.  Charles  River  Associates  (CRA)  and  Commonwealth 
Energy  jointly  funded  this  study. 

1.1.2  Objectives 

The  author  believes  that  neurocontrol  strategies  have  not  been  more  successful  in  real- 
world  applications  because  they 

1)  are  difficult  to  implement, 

2)  fail  to  account  for  dependent  internal  process  states,  and 

3)  have  difficulty  dealing  with  correlated  process  variables. 
The  scientific  objecfives  for  this  work  are  to  develop: 
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1)  an  application-based  neurocontrol  implementation  methodology, 

2)  state-space  neurocontrol  architectures, 

3)  methods  for  dealing  with  correlated  data, 

4)  accurate  combustion  models,  and 

5)  a  novel  combustion  controller. 

1.1.2.1  Application-Based  Neurocontrol  Implementation  Methodology 

Most  neurocontrol  implementations  in  the  literature  have  been  ad  hoc  and  application 

specific  [40].  In  many  cases,  the  process  has  been  simulated  and  thus  known  completely. 
Modem  control  theory  has  introduced  many  sophisticated  control  designs,  but  the  fact 
remains  that  approximately  90%  of  the  industrial  control  applications  apply  a  simple  pro- 
portional integral  differential  (PID)  controller  [37]. 

The  PID  controller  only  requires  the  process  engineer  to  specify  reasonable  knowl- 
edge of  the  process.  This  application-based  implementation  methodology  is  largely 
responsible  for  the  success  of  PID  control  in  the  industry.  If  neurocontrollers  are  to  enter 
the  mainstream  process  control  market,  there  will  have  to  be  designs  that  do  not  require 
detailed  knowledge  of  neural  network  or  neurocontrol  theory. 

1.1.2.2  Application-Based  Neurocontrol  Implementation  Methodology 

We  begin  in  Chapter  4  by  defining  a  methodology  for  categorizing  process  variables 

into  groups  based  on  a  set  of  objective  criteria  about  their  role  in  the  process.  Each  con- 
troller will  be  implemented  in  Chapter  7  based  on  this  labeling  of  the  process  variables, 
without  additional  process  knowledge.  This  will  allow  process  engineers  with  reasonable 
process  knowledge  and  without  any  knowledge  of  neural  network  or  neurocontrol  theory 
to  successfully  deploy  such  technologies.  Notice  that  achieving  such  a  straightforward 


implementation  methodology  does  not  imply  that  the  details  behind  the  neurocontroUer 
implementation  are  easy,  it  simply  requires  that  they  can  be  automated. 

1.1.2.3  State-Space  Neurocontroi  Designs 

Nonlinear  state-space  neural  network  architectures  offer  the  greatest  modeling  poten- 
tial, but  the  difficulties  in  their  training  have  led  investigators  to  reject  their  application 
[31].  In  fact,  most  neurocontroi  designs  are  based  on  input/output  process  models 
[31] [43] [40].  The  use  of  input/output  neural  network  architectures  in  the  design  of  neuro- 
controllers  assumes  that  all  input  variables  are  independent,  a  situation  which  in  not  likely 
to  be  found  in  real-world  applications.  Many  of  the  process  variables  that  affect  the  plant's 
output  will  depend  on  the  same  process  setpoints  that  a  controller  is  manipulating.  These 
facts  have  limited  the  performance  and  complicated  the  implementation  of  neurocontroi 
designs.  If  neurocontroi  is  to  be  a  viable  methodology  in  industrial  process  situations,  then 
the  controllers  will  have  to  be  extended  with  architectures  capable  of  dealing  with  internal 
process  states. 

There  has  been  limited  success  in  applying  state-space  neural  network  models  [31].  It 
is  widely  accepted,  however,  that  state-space  representations  hold  the  most  promise  for 
modeling  and  controlling  complex  processes  [34][35].  The  literature  seems  to  be  treating 
the  viability  of  state-space  architectures  as  an  all-or-nothing  affair.  Most  publications 
apply  purely  input/output  architectures  with  overwhelming  success,  while  a  few  investi- 
gates have  tested  purely  state-space  architectures  where  all  states  are  treated  as  hidden  and 
unknown  with  limited  to  no  success. 

This  work  will  empirically  investigate  several  shades  of  gray,  ranging  from  purely 
input/output  to  purely  state-space  controllers.  These  state-space  controller  designs  are  pre- 
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sented  m  Chapter  4,  and  their  performance  is  emp.ncally  investigated  in  chapters  7  and  8. 
The  primary  difference  in  the  state-space  representations  proposed  here  is  that  the  state 
vanables  are  treated  as  not  hidden  and  known.  This  will  require  an  extension  to  the  neuro- 
control  design  strategies  presented  in  the  literature  [31][43][40]. 

1.1.2.4  Methods  for  Dealing  with  Correlation 

Industrial  process  applications  are  unique  in  that  there  is  a  massive  amount  of  avail- 
able data.  The  input  variables  for  complex  process  models  are  typically  highly  correlated, 
a  situation  for  which  there  are  few  solutions  in  the  literature.  This  correlation  can  come 
from  several  sources:  dependent  states  (as  addressed  above),  physical  linkages,  and  soft 
linkages  through  control  strategies  the  lack  of  adequate  system  parameterization.  Indus- 
trial process  control  applications  will  require  the  process  engineer  to  select  representative 
variables  from  a  large  set  of  available  process  variables.  Because  of  issues  like  input  cor- 
relation, the  representative  variables  selected  will  have  a  significant  impact  on  the  perfor- 
mance of  the  resulting  neurocontroUer.  A  viable  neurocontrol  design  methodology  will 
have  to  be  able  to  cope  with  correlated  process  variables.  Linear  control  theory  deals  with 
this  aspect  through  parameterization  of  the  confroller.  Nonlinear  control  theory,  however, 
has  not  solved  this  problem  in  general.  In  Chapter  8  we  propose  to  compute  sensitivities 
through  a  committee  of  trained  neural  models  to  select  the  best  variables  for  system  iden- 
tification and  control. 

State-space  neurocontrol  architectures  will  be  able  to  explicitly  deal  with  one  source 
of  correlation  present  in  industrial  processes,  namely  correlation  produced  by  dependent 
state  variables.  As  mentioned  above  there  are  several  other  sources  of  correlation,  how- 
ever. Our  first  goal  is  to  empirically  quantify  the  impact  of  this  correlation  on  our  model- 


ing  and  control  objectives.  We  begin  by  ignoring  other  sources  of  correlation  and 
investigate  their  impact  in  Chapter  7.  Once  specific  problems  have  been  identified,  meth- 
ods are  developed  for  dealing  with  correlation  during  controller  implementation  in  Chap- 
ter 8,  and  the  performance  of  these  methods  are  quantified  with  respect  to  the  performance 
of  the  resulting  controllers. 

1.1.2.5  Accurate  Combustion  Models 

Little  is  known  about  how  NOx  is  formed  from  air-bound  nitrogen  during  combustion. 

To  date,  reliable  models  for  NOx  formation  in  electric  power  boilers  have  not  been  avail- 
able [39].  There  are  not  adequate  models  for  many  real-worid  industrial  processes,  a  fact 
which  has  also  limited  the  acceptance  of  modem  control  strategies.  One  reason  that  neuro- 
control  has  received  such  widespread  attention  is  because  of  its  potential  ability  to  deal 
with  very  complex  processes  that  have  escaped  modeling. 

Chapter  6  develops  accurate  combustion  models  according  to  accepted  modeling  per- 
formance metrics.  Chapter  7  demonstrates  the  impact  that  correlation  has  on  neurocontrol 
designs,  and  Chapter  8  investigates  its  impact  on  the  accuracy  of  the  underlying  process 
models.  Here  it  will  be  shown  that  accuracy  is  subjective,  and  that  in  fact  no  good  metrics 
for  model  accuracy  are  available  in  the  literature.  A  new  metric  is  proposed  and  empiri- 
cally compared  against  available  metrics  in  the  literature  in  Chapter  8.  Applying  this  new 
metric,  predictive  combustion  models  are  developed  and  used  to  shed  light  on  which  pro- 
cess variables  have  the  greatest  impact  on  NOx  and  CO  formation. 
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1.1.2.6  Novel  Combustion  Controller 

This  project  develops  four  neurocontrollers  for  the  complex  industrial  process  of  NOx 

formation.  We  begin  in  Chapter  3  by  looking  at  boiler  optimization  from  a  first-principles 
perspective,  focusing  on  what  a  NOx  controller  is  expected  to  achieve  and  why  classical 
control  methods  are  not  able  to  achieve  it.  Predictive  neural  network  combustion  models 
are  then  developed  in  Chapter  6,  and  deployed  within  online  neurocontrollers  in  chapters  7 
and  8.  The  performance  of  each  of  these  controllers  is  then  quantified  to  compare  and  con- 
trast the  four  control  designs  in  Chapter  8. 

To  the  authors  knowledge,  this  work  developed  the  first  NOx  controller  for  a  gas  and 
oil  co-fired  electric  power  plant.  New  regulations  and  the  restructuring  of  the  electric 
power  industry  have  combined  to  create  a  NOx  trading  market.  The  annual  benefits  to  a 
gas  and  oil  co-fired  electric  power  plant  associated  with  a  25%  NOx  reducfion  will  be  in 
the  range  of  $2,000,000  to  $5,000,000.  Cleariy,  a  control  strategy  that  uses  existing  plant 
capital  investments  and  runs  on  a  $2,000  pentium  workstation  has  tremendous  value. 

1.2  Document  Organization 

Chapter  2  presents  a  summary  of  the  required  background  and  a  literature  review  of 

the  relevant  work  in  neurocontrol,  along  with  references  to  the  literature  for  more  detailed 
treatments.  This  chapter  is  the  best  place  for  readers  to  become  familiar  with  the  notation 
used  throughout  this  work. 

A  strategy  for  reducing  the  NOx  emissions  from  a  fossil-fired  generating  unit  is  pro- 
vided in  Chapter  3.  The  goal  of  this  section  is  to  provide  a  physical  understanding  for  what 
we  are  asking  the  controllers  to  perform,  thus  providing  justification  that  our  objectives 
are  feasible.  Chapter  4  develops  four  detailed  neurocontrol  designs  belonging  to  the  model 


8 

predictive,  model  inverse,  and  model  reference  control  families.  The  designs  are  presented 
as  generalized  methodologies  that  are  applicable  to  any  control  application.  Chapter  5  pre- 
sents a  management  and  preprocessing  methodology  for  collecting  data  in  support  of  these 
control  designs  and  the  required  modeling. 

Each  control  design  considered  requires  accurate  process  models  for  its  implementa- 
tion. Chapter  6  presents  a  modeling  methodology  for  developing  these  models.  The  con- 
trollers are  then  implemented  in  Chapter  7.  The  performance  for  each  resulting  controller 
is  then  quantified  using  offline  simulations  and  online  experiments.  Significant  problems 
are  discovered  with  the  controllers  for  which  there  are  no  solutions  in  the  literature.  These 
problems,  along  with  proposed  solutions,  are  investigated  further  in  Chapter  8.  This  sec- 
tion additionally  demonstrates  the  validity  of  these  solutions  by  quantifying  the  perfor- 
mance of  the  revised  controllers. 

The  "key  learnings"  and  extensions  to  this  work  are  summarized  in  Chapter  9. 


CHAPTER  2 
LITERATURE  REVIEW 

This  chapter  provides  background  for  the  rest  of  this  document  in  the  areas  of: 

1)  optimization, 

2)  neural  networks, 

3)  neurocontrol, 

4)  NOx,  and 

5)  fossil-fired  power  plants. 

2.1  Optimization 

Mathematical  optimization  methods  are  at  the  heart  of  modem  modeling  and  control 
applications.  Neural  networks  use  optimization  methods  to  facilitate  learning,  and  control 
applications  apply  these  methods  to  meet  their  control  objectives.  The  notation  and  meth- 
ods presented  in  this  section  will  be  used  extensively  throughout  the  rest  of  this  document. 

Optimization  is  defined  as  the  process  of  finding  the  values  of     decision  variables 

^  6  K'^  that  minimize  a  scalar  performance  objective  J  e  K'^  ->  9?  [15].  Formally,  this 
optimization  task  will  be  represented  as 

ArgMin^{Jii)} ,  (1) 
where  K  is  the  decision  variable  space,  which  is  most  often  taken  to  be  euclidean 
K  =  9? .  Optimization  methods,  also  known  as  mathematical  programming  methods,  can 
be  classified  according  to  the  amount  of  a  priori  information  available  about  the  system 
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being  optimized  [54].  The  following  sections  broadly  categorize  optimization  methods 
into  the  following: 

1)  Classical  Analytic  Optimization:  where  the  system  being  optimized  is 
known  completely  or  nearly  completely  and  a  tractable  analytic  solution 
exists 

2)  Descent  Optimization:  where  first  and/or  second-order  partial  deriva- 
tives are  available  everywhere  for  the  parameters  of  the  system  being 
optimized 

3)  Direct  Optimization:  where  little  to  no  a  priori  knowledge  exists  about 
the  physical  structure  of  the  system  being  optimized 

If  the  optimization  problem  involves  objective  functions  or  constraints  which  cannot 
be  stated  as  explicit  functions  of  the  design  variables  or  are  too  complicated  to  manipulate, 
we  cannot  solve  it  by  using  classical  analytic  optimization  methods.  This  work  will  be 
dealing  with  complex  systems  where  little  is  known  a  priori  and  will  therefore  not  con- 
sider analytic  optimization  methods. 

2.1.1  Iterative  Optimization  Methods 

All  direct  and  descent  optimization  methods  are  iterative  in  nature,  i.e.,  they  start  from 

an  initial  trial  solution  and  proceed  toward  the  minimum  point  in  a  sequential  manner.  An 
iterative  optimization  method  is  typically  judged  based  on  its  rate  of  convergence  [54].  In 
general,  an  optimization  method  is  said  to  have  convergence  of  order  p  if 

\\Hn+\)-i*\\  ^2) 

ii^(«)-^*ir 

where  i(n)  and  ^(«  +  1)  denote  the  points  obtained  at  the  end  of  iterations  n  and  «  +  1 , 
respectively,  ^*  represents  the  optimum  point,  and  denotes  the  length  or  norm  of  the 
vector  Jt . 
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Up  =  1  and  0  <  A:  <  1 ,  the  method  is  said  to  be  hnearly  convergent  (corresponds  to 
slow  convergence).  Up  =  2 ,  the  method  is  said  to  be  quadratically  convergent  (corre- 
sponds to  faster  convergence)  [65]. 

2.1.2  Direct  Optimization 

In  problems  where  analytic  solutions  are  not  possible  and  the  design  variables  are  of 

mixed  type  there  is  little  choice  but  to  use  some  variation  on  a  direct  search  methodology. 
Direct  searches  may  be  broken  into  the  following  broad  categories. 

2.1.2.1  Exhaustive  methods 

In  most  practical  applications,  the  optimum  solution  is  known  to  lie  within  restricted 

ranges  of  the  design  variables.  Exhaustive  search  methods  are  applied  to  problems  where 
the  interval  in  which  the  optimum  is  known  to  lie  is  finite.  Conceptually,  these  methods 
evaluate  the  objective  function  at  a  predetermined  number  of  points  in  this  interval  and 
reduce  the  interval  of  uncertainty  using  the  assumption  of  unimodality.  Exhaustive  meth- 
ods include  [54]: 

1)  Random  Search 

2)  Grid  Search 

3)  Pattern  Directions 

2.1.2.2  Elimination  methods 

The  exhaustive  search  methods  are  similar  to  a  larger  class  of  algorithms  known  as 

elimination  methods,  because  they  search  by  eliminating  parts  of  the  interval.  Elimination 
methods  differ  in  how  they  search  and  discard  sub-intervals.  The  more  common  elimina- 
tion methods  include  [54]: 

1)  Dichotomous  Search 
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2)  Interval  Halving 

3)  Fibonacci  Method 

4)  Golden  Section  Method 

2.1.2.3  Interpolation  methods 

Interpolation  methods  iteratively  fit  the  local  performance  surface  with  a  simple  poly- 
nomial form,  and  then  approximate  the  minimum  point  of  the  system  as  the  minimum 
point  of  the  polynomial  [65].  These  methods  are  generally  more  efficient  than  elimination 
methods  and  can  be  accelerated  if  gradient  information  is  available.  Some  of  the  more 
popular  interpolation  methods  include  [54]: 

1)  Quadratic  Method 

2)  Cubic  Method 

3)  Newton  Method 

4)  Quasi-Newton  Method 

5)  Secant  Method 

2.1.2.4  Unrestricted  methods 

When  the  design  variable  range  is  not  known  the  search  must  be  performed  without 

restrictions  on  the  values  of  the  variables.  Most  of  these  methods  use  a  step  size  and  move 
from  an  initial  guess  in  favorable  direction  (positive  or  negative)  [54].  The  step  size  used 
must  be  small  in  relation  to  the  final  accuracy  desired.  This  method  is  often  accelerated  by 
using  a  variable  step  size.  These  methods  include  [54]: 

1)  Simplex  Method 

2)  Revised  Simplex  Method 

3)  Karmarkar's  Method 

4)  Hook's  and  Jeeves'  Method 
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5)  Rosenbrock's  Method 
In  addition,  evolutionary  computing  techniques  hke  genetic  algorithms  belong  to  this 

category. 

2.1.2.5  Line  search 

All  of  the  direct  search  methods  presented  above  can  be  applied  to  both  one-dimen- 
sional or  n-dimensional  searches.  A  one-dimensional  search  is  often  referred  to  as  a  line 

search  since  we  are  searching  along  a  line.  The  aim  of  all  line  searches  is  to  find  r)*  e  9? 
such  that 

Tl*  =  ArgMin^{JCz  +  r]i'^)},  (3) 

where  z  is  the  design  vector,  and  z'^  is  a  known  search  direction. 

One  of  the  most  efficient,  and  hence  most  popular,  line  searches  uses  the  Quadratic 

Method  to  find  r\*  [54].  This  method  has  been  applied  in  this  work,  using  the  following 
algorithm: 

I:    Normalize  the  search  vector  i'^  by  dividing  each  component  by  the 
absolute  value  of  the  element  of     with  the  maximum  absolute  value 

II:   Evaluate  the  function  at  the  points  A  =  0  and  D  =  r\Q,  where  riQ  is 
an  initial  step  size 

III:  1{J^>J^  then  set  C  =  D  and  5  =  rio/2 

IV:  Else  set  B  =  D  and  evaluate  at  the  point  E  =  2r\Q .  If  7^  >  J^,  then 
set  C  =  E .  Else  set  D  =  E  and  tiq  =  2r|Q,  and  goto  step  III 

V:  Calculate 
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VI:  UJg-J^,<Af""  then  set  ri*  =  r|*  and  quit 

VII:  If  Ti*  <  5  then  set  C  =  5  and  B  ^  r\*  .  Else  set  ^  =  5  and  5  =  fj* 
VIII:Goto  step  V 

where  Aj"""  is  the  minimum  change  in  J  to  detect  early  stopping. 

2.1.3  Descent-Based  Optimization 

When  all  values  of  z  e  9^^  are  possible  and  the  function  J(i)  has  first  and  second  par- 
tial derivatives  everywhere,  the  necessary  conditions  for  a  local  minimum  are 

%=0,  (5) 
oz 

by  which  we  mean  dJIdz-  =  0,  V/  and 

^>0,  (6) 

by  which  we  mean  that  the  /w  x  w  -matrix  whose  components  are  dzj  must  be  pos- 

itive semi  definite,  i.e.,  have  eigenvalues  that  are  zero  or  positive  [15]. 

All  points  that  satisfy  (5)  are  called  stationary  points.  Sufficient  conditions  for  a  local 
minimum  are  (5)  and 

—  >^,  (7) 

er 

2  2 

that  is  all  eigenvalues  must  be  positive.  If  (5)  is  satisfied  but  d  J/di  =  0 ,  that  is,  the 
determinant  of  the  matrix  is  zero  (meaning  that  one  or  more  of  its  eigenvalues  is  zero), 
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additional  information  is  needed  to  establish  whether  or  not  the  point  is  a  minimum.  Such 
a  point  is  called  a  singular  point. 

2.1.3.1  Methods 

Classical  analytic  optimization  methods  use  these  conditions  to  solve  for  the  optimal 
solution.  If  the  optimization  problem  involves  an  objective  function  or  constraints  that  can 
not  be  stated  as  explicit  functions  of  the  design  variables,  or  which  are  too  complicated  to 
manipulate,  then  descent  optimization  methods  provide  efficient  alternatives.  In  general, 
these  methods  will  have  significantly  better  convergence  characteristics  than  direct  meth- 
ods [54]. 

Descent  search  methods  are  iterative  algorithms  for  improving  estimates  of  the  deci- 
sion variable,  ^ ,  so  as  to  come  closer  to  satisfying  the  conditions  for  a  stationary  point. 
The  steps  in  using  the  descent  method  are  as  follows: 

I:    Set  Aj  =  0  and  guess  at  the  initial  design  vector  ^(«) ,  usually  random 

II:   Determine  the  values  of  dJld^{n) 

III:  Interpreting  dJ/di{n)  as  the  gradient  vector,  determine  the  search 

d 

direction  i  (n)  -  j  (dJ/di(n))  as  a  function  of  this  gradient 

IV:  Determine  the  step  size  to  be  taken  r|(«)  =/      («)),  as  a  function 
of  this  direction 

V:  Update  the  estimates  of  i{n  +  1)  =  i{n)  +  r\(n)i'^{n) 
VI:  Repeat  II  until  (dH/di{n))(dH/diin)f  is  very  small 
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The  variations  in  descent-based  optimization  can  be  expressed  as  variations  in  the 

determination  of  the  direction  vector  /  and  the  step  size  /  .  Some  of  the  more  common 
variations  include  [15][54][65]: 

1)  Steepest  Descent: 

i\n)  =  -dH/diin)  (8) 

ri(rt)  =  constant  (9) 

2)  Steepest  Descent  with  Momentum: 

i^n)  =  -  (1  -  p)dH/di{n)  +  pi'^(n)  (10) 
r\in)  =  constant  (11) 

3)  Conjugate  Gradients: 

^d^^^  ^  (dH/dHn  +  l)f[dH/dzin  +  \  )  -  dWdijn)]  ^^2) 
(dWd^in  +  l)fdH/dHn  +  1) 

ri(«)  =  LineSearch(^''(«))  (13) 

2.2  Neural  Networks 

Artificial  neural  networks  (ANNs)  are  biologically  motivated  data  processing  structure 

that  consist  of  a  large  number  of  relatively  simple  highly  intercoimected  neurons  or  pro- 
cessing elements  (PEs)  [24].  In  general,  these  structures  provide  an  inductive  mathemati- 
cal model  that  can  be  represented  by 

P=f(ii,^),  (14) 

N"  N-^' 

where /:5?    ->  9^    is  the  model's  input/output  map,  and  ^  €  9?    ,  i<  e       ,  and 
TV  e  9?^  are  its  outputs,  inputs,  and  coefficients,  respectively.  The  coefficients  in  an 
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ANN  map  are  commonly  referred  to  as  weights.  Artificial  neural  networks  infer  or  learn 

the  relationships  between  p  and  ii  by  observing  actual  process  data.  In  this  way,  ANNs 
can  be  applied  to  generalized  regression  and  classification  inference  problems. 
ANN  architectures  possess  two  fundamental  properties: 

1)  They  are  capable  of  approximating  to  arbitrary  accuracy  any  continuous 
function,  i.e.,  they  are  universal  mappers  [24]. 

2)  They  have  robust  optimization  convergence  properties  with  respect  to 
the  optimization  of  their  coefficients,  i.e.,  they  are  robust  learners  [26]. 

These  properties  make  ANNs  a  useful  tool  for  empirical  modeling  tasks  where  little  to 
no  a  priori  information  is  available  about  the  underlying  process. 

2.2.1  Model  Architecture 

There  are  many  types  of  ANNs  in  the  literature,  each  with  specific  advantages  when 

modeling  various  types  of  processes  [14].  The  two  primary  factors  which  differentiate 
between  ANN  models  are  their  architecture  and  their  learning  rule.  A  model's  architecture 
defines  the  way  in  which  it  processes  input  information  to  produce  output  information, 
i.e.,  the  form  of  their  mathematical  input/output  map  /. 

2.2.1.1  Multi-layer  perceptron 

Most  ANNs  presented  in  the  literature  are  static  mappers,  i.e.,  they  are  only  capable  of 

modeling  static  or  steady-state  process  relationships.  By  far  the  most  popular  and  widely 
applied  ANN  architecture  is  called  the  multilayer  perceptron  (MLP)  [24].  This  network 
consists  of  fiilly-interconnected  layers  of  PEs  with  logistic  response  characteristics.  The 
MLP  network  is  typically  configured  with  one  or  two  hidden  layers  of  PEs.  A  two-layer 
MLP  is  illustrated  in  Figure  1. 
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Figure  1 :  Multilayer  perceptron  model  architecture. 


Formally,  using  matrix  algebra  this  architecture  is  given  by 

/"V,^)  =  #a(Tl>'"a(T^"d  +  S")  +  ^"')  +  ^',  (15) 
where  T^'  e  9?'^'"       is  the  matrix  of  weights  for  the  first  hidden  layer,  S'"'  €  9^^^*'  is  a 

vector  of  bias  values  for  this  layer,        e  91 ^      is  the  matrix  of  weights  for  the  sec- 
ond hidden  layer,  6    g  9?  "  is  the  bias  vector  for  this  layer,      e  91  ^     "  is  the  matrix 
of  weights  for  the  output  layer,  b  e  91  *  is  the  corresponding  bias  vector,  A^^  j  is  the 
number  of  PEs  in  the  first  hidden  layer,  jV;,2  is  the  number  of  PEs  in  the  second  hidden 

layer,  a  is  the  tanh  logistic  function,  and  the  set     =  {  }  repre- 

sents the  model's  weights. 

When  the  process  being  modeled  is  dynamic,  i.e.,  its  current  output  is  a  function  of  its 
current  state  as  well  as  previous  process  states,  static  models  are  not  well  suited.  For  such 
situations,  models  which  are  able  to  extract  both  static  and  temporal  process  relationships 
are  required.  The  most  common  method  for  creating  dynamic  neural  networks  is  to  simply 
place  dynamic  PEs  in  the  input  layer  of  a  static  MLP  [17].  These  models  have  been 
referred  to  by  many  researchers  as  dynamic  neural  networks  (DNNs).  The  dynamic  PEs 
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can  have  response  characteristics  based  on  a  priori  process  knowledge  or  contain  adaptive 
memory  mechanisms  or  filters. 

2.2.1.2  Time-delay  neural  network 

The  most  common  DNN  is  called  the  time-delay  neural  network  (TDNN)  [24].  This 

architecture  consists  of  a  MLP  where  each  input  PE  has  an  adaptive  linear  FIR  filter,  as 
illustrated  in  Figure  2. 


Figure  2:  TDNN  input  PE  connectivity. 

Formally,  the  TDNN  can  be  described  by 

r    M,(0  j  =  1 


V/,y  (16) 


f""im^)= (Ht),N\^),  (17) 

where  f'^':'^^  ^  ^  represents  the  tapped-delay  line  operator,  and  A^^  is  the  num- 

ber of  taps  in  the  delay  line. 

2.2.1.3  Gamma  neural  network 

The  main  disadvantage  to  most  DNNs  is  that  they  preprocess  the  input  to  extract  fixed 

and  known  dynamics  of  the  process  data  rather  than  learn  these  dynamics  from  this  data. 
The  TDNN  can  be  considered  as  an  exception  to  this  rule,  but  here  the  process  must  have 
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finite  impulse  response  (FIR)  dynamics  of  known  order.  The  Gamma  Neural  Network 
(GNN)  [17]  represents  an  important  class  of  dynamic  ANN  models  that  is  able  to  learn 
infinite  impulse  response  (IIR)  process  dynamics  without  a  priori  knowledge  about  the 
structure  or  order  of  these  dynamics. 


(1-Y2>"' 


M.(0 

Figure  3:  Gamma  memory  processing  element. 

The  GNN  architecture  is  conceptually  an  MLP  with  adaptive  Gamma  Filters  (GF) 
placed  at  the  output  of  its  input  and  hidden  layer  PEs.  A  single  GF  is  illustrated  in  Figure 
3.  Formally,  the  GNN  is  given  by 


(0  j  =  1 

Wyf,  _ ,  (0  +  ( 1  -  JjWjO  -  1 )  Jei2,N'] 


where     :  'iR    ^  5^         represents  the  GF,  A'^    is  the  number  of  GF  taps  in  the  input 

layer,  and  N    is  the  number  of  GF  taps  in  the  first  hidden  layer.  This  architecture  has 
been  presented  without  a  GF  in  the  second  hidden  layer,  but  such  a  configuration  would  be 
a  straightforward  extension. 
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2.2.1.4  Nonlinear  state-space  model 

The  GNN  uses  the  Gamma  Filter  to  represent  process  dynamics.  The  GF  approximates 

these  dynamics  from  a  Gamma  memory  kernel  basis.  The  Gamma  kernels  are  able  to 
model  an  important  class  of  dynamics  but  may  not  be  the  best  representation  for  general 
process  dynamics.  An  alternative  approach  to  using  Gamma  kernels  is  to  design  the  ANN 
architecture  with  an  explicit  state  and  data  flow  structure  that  is  capable  of  learning  uni- 
versal process  dynamics.  This  approach  is  the  goal  of  the  nonlinear  state-space  model 
(NLSS)  which  implements  process  dynamics  directly  as  a  nonlinear  state  evolution  equa- 
tion and  an  output  observation  equation  [43],  as  given  by 


where  k(t)  s  ^    is  the  models  state  vector  consisting  of     hidden  PEs,  f  is  an  ANN 


is  a  second  ANN  map  describing  how  outputs  are  produced  from  this  state,  and  are  the 
weights  of  this  output  network.  Figure  4  illustrates  the  configuration  of  a  NLSS  network. 


^(0  =f(Ht-^),HO,A) 


(20) 


Pit)  =fim,Ho,^j, 


(21) 


map  describing  the  time  evolution  of  this  state,      are  the  weights  of  this  state  network,/ 


f 


-1 


z 


Figure  4:  Nonlinear  state  space  neural  network  configuration. 
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2.2.2  Learning  Algorithms 

The  biological  roots  of  neural  networks  are  responsible  for  the  widespread  use  of  the 

term  learning  to  describe  the  process  during  which  the  network  parameters  are  changed  to 
improve  the  performance  of  the  neural-network-based  system.  An  ANN  learning  algo- 
rithm specifies  how  its  weights  are  updated  in  response  to  training  data.  These  algorithms 
are  simply  optimization  methods  applied  to  the  task  of  finding  the  best  model  weights  to 
minimize  a  specified  modeling  objective  J,  i.e.,  ArgMin^{J} .  In  general,  any  of  the 

optimization  methods  presented  above  can  be  used  to  solve  this  problem. 

One  of  the  most  significant  breakthroughs  in  the  field  of  ANNs  was  the  realization  that 
the  chain  rule  for  ordered  partial  derivatives  provides  a  mechanism  for  deriving  the  first- 
order  gradients  for  all  weights  in  a  model,  even  though  the  modeling  objective  is  only  an 
explicit  function  of  the  model's  outputs  [72].  Recall  that  the  chain  rule  for  ordered  partial 
derivatives  is  given  by 

dx:      5x,        dxi  dxi 
J        J    i*j    '  J 

Applying  the  chain  rule  allows  sensitivities  to  be  calculated  fi-om  the  output  of  the 
model  back  to  its  input,  which  is  why  the  resulting  algorithm  has  been  coined  "backpropa- 
gafion"  in  the  literature  [40].  When  the  variables  are  temporally  related,  the  chain  rule  has 
the  following  form 

d°J   ^  _dJ_  +  Y  Y  dxi^t^x) 

Here,  in  addifion  to  backpropagating  sensitivities  from  the  model's  output  to  its  input, 
the  sensitivities  are  backpropagated  through  time. 
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The  most  common  optimization  method  is  simply  steepest-descent  with  momentum, 
although  many  variations  have  been  demonstrated  to  significantly  improve  convergence 
[10].  The  issues  leading  to  the  selection  of  one  optimization  method  over  another  are: 

1)  Convergence  Rate 

2)  Implementation  Complexity 

3)  Configuration  Complexity 

4)  Avoidance  of  Local  Minima 

5)  Sensitivity  to  Correlation 

The  most  common  modeling  objective  used  in  ANNs  is  the  mean  squared  error  (MSB) 

between  the  model's  output  p  and  a  specified  desired  response    e  9^    ,  as  given  by 

^^^,=  1/=  1 

where  V  is  the  number  of  samples  in  the  training  dataset.  Learning  rules  which  use  the 
MSE  criterion  are  commonly  classified  as  supervised  learning  rules,  because  of  the  pres- 
ence of  a  "teacher"  implied  by  the  explicit  specification  of  a  desired  response.  Learning 
rules  without  explicit  reference  to  a  desired  response  for  the  model  in  the  objective  func- 
tion are  commonly  referred  to  as  unsupervised  learning  rules. 

2.2.3  Generalization 

As  universal  mappers  ANNs  are  almost  always  more  complex  than  the  relationships 

that  we  seek  to  uncover.  The  net  result  is  that  ANNs  are  notorious  for  over-fitting  a  train- 
ing dataset,  i.e.,  performing  well  on  training  data  but  poorly  on  a  blind  test  dataset  [24].  It 
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is  very  important  to  optimize  the  complexity  of  the  neural  network  in  order  to  achieve  the 
best  generalization. 

2.2.3.1  Bias  and  variance 

Considerable  insight  into  this  phenomenon  can  be  obtained  by  introducing  the  concept 

of  the  bias-variance  trade-off.  Bishop  [13]  observes  that  the  generalization  error  C,,  using 
the  Euclidean  norm,  will  depend  on  a  particular  dataset  D  on  which  the  network  was 
trained.  The  dependence  on  D  can  be  eliminated  by  considering  an  average  over  the  com- 
plete ensemble  of  datasets,  which  can  be  written  as 

(;  =  E^[((^\ii)-fiti,^)f],  (25) 
where  {d\ti)  denotes  the  conditional  average,  or  regression,  of  the  desired  data  given  by, 

{^\ti)  =  pp0\ii)dt,  (26) 

and  p(d\ti)  is  the  conditional  density  of  the  desired  variable  d  conditioned  on  the  input 

vector  h .  Bishop  [13]  demonstrates  that  this  generalization  error  can  be  decomposed  into 
the  sum  of  the  bias  squared  plus  the  variance 

C  =  (£^[/(d,  ^)]  -  +  E^ifiii,  ^)  -  E^ifiti,  ^)]] .  (27) 

A  model  which  is  too  simple,  or  too  inflexible,  will  have  a  large  bias,  while  one  which 
has  too  much  flexibility  in  relation  to  the  particular  dataset  will  have  a  large  variance.  Bias 
and  variance  are  complementary  quantities,  and  the  best  generalization  is  obtained  when 
we  have  the  best  compromise  between  the  conflicting  requirements  of  small  bias  and 
small  variance.  The  variance  of  the  prediction  will  be  further  addressed  below.  Section 
2.2.4  "Standard  Errors." 
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For  any  given  dataset,  there  is  some  optimal  balance  between  bias  and  variance  which 
gives  the  smallest  average  generalization  error.  In  order  to  improve  the  performance  of  the 
network  further  we  need  to  be  able  to  reduce  the  bias  while  simultaneously  reducing  the 
variance.  The  more  straightforward  way  of  achieving  this  is  to  use  more  data  samples.  As 
the  number  of  data  samples  is  increased  we  can  afford  to  use  more  complex  models,  hence 
reducing  the  bias,  while  at  the  same  time  ensuring  that  each  model  is  more  heavily  con- 
strained by  the  data.,  thereby  also  reducing  the  variance.  If  the  number  of  data  samples  is 
increased  rapidly  in  relation  to  the  model  complexity  we  can  find  a  sequence  of  models 
such  that  both  bias  and  variance  decrease.  Models  such  as  ANNs  can  in  principle  provide 
consistent  estimators  of  arbitrary  accuracy  as  the  number  of  data  points  is  increased  to 
infinity.  Note  that,  even  if  both  the  bias  and  variance  can  be  reduced  to  zero,  the  generali- 
zation error  will  still  be  nonzero  due  to  the  intrinsic  noise  in  the  data. 

One  rarely  has  infinite  data,  and  practical  issues  like  training  time  make  simply  adding 
more  data  points  impractical.  There  are  several  practical  and  practiced  ways  to  improve 
model  generalization,  we  start  with  regularization. 

2.2.3.2  Regularization 

Regularization  was  originally  proposed  by  Tikhanov  [62]  as  a  method  for  solving  ill- 
posed  problems.  The  basic  idea  is  to  stabilize  the  solution  by  means  of  some  auxiliary 
normegative  functional  that  embeds  prior  information,  e.g.  smoothness  constraints  on  the 
input/output  mapping.  Regularizafion  is  able  to  transform  an  ill-posed  problem  into  a 
well-posed  problem  [48]. 
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Tikhanov's  regularization  theory  uses  a  regularization  penalty  term  of  the  form 

^  =  \\Phnt^  (28) 

where  P  is  a  linear  (pseudo)  differential  operator.  This  penalty  term  is  added  to  the  objec- 
tive function  to  give 

J  =  J+rO,  (29) 
where  F  is  the  regularization  parameter.  Prior  information  about  the  form  of  the  solution 

(i.e.,  the  plant)  is  embedded  in  the  operator  P .  The  operator  P  is  referred  to  as  a  stabilizer 
in  the  sense  that  it  stabilizes  the  solution  p ,  making  it  smooth. 

The  appropriate  choice  for  P  and  the  solution  to  (28)  requires  functional  analysis  and 
is  beyond  the  scope  of  this  work.  The  most  commonly  used  form  of  regularizer,  however, 
is  quite  simple  to  implement.  Weight  decay  regularizer  terms  consist  of  the  sum  of  squares 
of  the  adaptive  parameters  in  the  network 

O  =  i^w,,  (30) 

where  the  sum  runs  over  the  weights  and  biases.  In  conventional  curve  fitting  the  use  of 
this  form  of  regularizer  is  called  ridge  regression.  It  has  been  found  empirically  that  a  reg- 
ularizer of  this  form  can  lead  to  significant  improvements  in  generalization  [29]. 

2.2.3.3  Growing  and  pruning  algorithms 

The  topology  of  a  neural  network,  number  of  units  and  interconnections,  can  have  a 

significant  impact  on  its  performance.  Regularization  helps  to  minimize  this  impact  when 
the  complexity  of  the  network  is  larger  than  required  for  the  particular  application. 
Clearly,  however,  a  better  approach  is  to  match  the  complexity  of  the  model  with  the  com- 
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plexity  of  the  application.  Various  techniques  have  been  developed  for  optimizing  the 
topology,  in  some  cases  as  part  of  the  network  training  process  itself  [43].  It  is  important 
to  distinguish  between  two  distinct  aspects  of  the  topology  selection  problems.  First,  we 
need  a  systematic  procedure  for  exploring  some  space  of  possible  architectures.  Second, 
we  need  some  way  of  deciding  which  of  the  architectures  considered  should  be  selected. 

A  straightforward  approach  to  network  structure  optimization  involves  an  exhaustive 
search  through  a  restricted  class  of  network  topologies.  This  approach  requires  significant 
computational  effort  and  only  searches  a  very  restricted  class  of  network  topologies.  Much 
of  the  computational  burden  can  be  lessened  by  considering  a  network  which  is  relatively 
small  and  by  allowing  new  units  and  connections  to  be  added  during  training.  This 
approach  was  shown  to  be  successful  by  Bello  [10]  who  used  the  weights  from  one  net- 
work as  the  initial  guess  for  training  the  next  network  (with  the  extra  weights  initialized 
randomly).  Techniques  of  this  form  are  called  growing  algorithms.  An  alternative 
approach  is  to  start  with  a  relatively  large  network  and  gradually  remove  units;  these  are 
known  as  pruning  algorithms.  Most  of  these  procedures  are  ad  hoc  and  tailor  to  specific 
applications,  that  is  not  to  say,  however,  that  they  are  ineffective. 

More  recent  work  has  taken  advantage  of  developments  in  discrete  optimization  using 
genetic  algorithms  [36].  Genetic  algorithms  provide  a  methodical  way  of  searching  large 
discrete  spaces  more  efficiently. 

2.2.3.4  Cross-validation 

An  alternative  to  regularization  as  a  way  of  controlling  the  effective  complexity  of  a 

network  is  the  procedure  of  cross-validation  [13].  The  training  of  a  nonlinear  model  corre- 
sponds to  the  iterative  reduction  of  the  error  function  defined  with  respect  to  the  training 


28 

dataset.  During  training,  the  error  will  generally  monotonically  decrease  as  a  function  of 
the  number  of  presentations  of  the  training  dataset,  i.e.,  epochs.  However,  the  generaliza- 
tion error,  with  respect  to  an  independent  dataset  called  the  validation  dataset,  often  shows 
a  decrease  at  first,  followed  by  an  increase  as  the  network  starts  to  over-fit.  Training  can 
therefore  be  stopped  at  the  point  of  smallest  error  with  respect  to  the  validation  dataset  as 
this  produces  a  network  with  the  smallest  generalization  error  (or  at  least  an  approxima- 
tion thereof). 

2.2.3.5  Committees  of  networks 

In  practice,  building  neural  network  models  requires  the  training  of  many  different 

candidate  networks  and  then  the  selection  of  the  best  performer.  Typically  performance  is 
based  on  the  networks  performance  on  a  third  dataset  not  used  for  training  or  cross-valida- 
tion. There  are  two  disadvantages  to  this  approach.  First,  all  of  the  effort  involved  in  train- 
ing the  remaining  networks  is  wasted,  and  secondly,  the  generalization  performance  on 
the  validation  dataset  has  a  random  component  due  to  the  noise  on  the  data  [13].  The  net- 
work which  performed  the  best  on  this  dataset  might  not  be  the  one  with  the  lowest  gener- 
alization error.  Recall  that  the  generalization  error  is  averaged  over  all  datasets  (25). 

These  limitations  can  be  overcome  by  combining  the  networks  together  to  form  a 
committee  [47] [46].  This  approach  was  shown  to  provide  significant  improvements  in  the 
generalization  error.  Denote  the  committee  prediction  as 

f(ti,t)  =  ±yy.(tl,^^),  (31) 


where     are  the  weights  of  committee  member  / ,  and     =  { ^, }     j  is  the  set  of  all 

weights  for  the  committee. 

Bishop  [13]  shows  that  if  the  errors  of  the  individual  committee  member  are  decorre- 
lated,  then  the  committee  will  always  have  a  lower  generalization  error  than  any  of  its 
individual  members. 

2.2.4  Standard  Errors 

Tibshirani  [61]  reviews  a  number  of  methods  for  estimating  the  standard  error  of  pre- 
dicted values  from  a  multi-layer  perceptron.  These  include  direct  evaluation  of  maximum 
likelihoods  based  on  the  Hessian  matrix,  the  "sandwich"  estimator  and  the  bootstrap 
method.  Tibshirani  offers  the  following  observations: 

1)  The  bootstrap  methods  provided  the  most  accurate  estimates  of  the  stan- 
dard errors  of  predicted  values. 

2)  The  non-simulation  methods  (delta  and  sandwich)  missed  the  substan- 
tial variability  due  to  the  random  initial  weights  from  the  multiple  train- 
ing runs. 

The  non-simulation  methods  are  solved  analytically,  and  therefore  require  unique 
solutions  for  each  network  topology.  Whereas  the  bootstrap  methods  apply  to  all  network 
topologies,  as  well  as  non-neural  network  paradigms.  The  additional  fact,  as  noted  above, 
that  the  bootstrap  methods  account  for  local  minima,  provides  strong  argument  for  their 
use. 

2.2.4.0.1  Bootstrap  methods 

Bootstrap  methods  work  by  creating  many  pseudo-replicates  ("bootstrap  datasets") 

from  the  training  dataset  and  then  reestimating  the  models  weights  Tv  on  each  bootstrap 
dataset.  There  are  two  different  approaches  to  bootstrapping  [9].  One  can  consider  each 
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training  case  as  a  sampling  unit,  and  sample  with  replacement  from  the  training  dataset 
cases  to  create  a  bootstrap  sample.  This  is  often  called  the  "bootstrap  pairs"  method.  The 
bootstrap  pairs  sampling  algorithm  is  given  by: 

I:    Generate      samples,  each  one  of  size      drawn  with  replacement 
from  the  A'^  training  observations  { ^(i)}^=  i ,  and  the  b  -th  sam- 

b  ^ 

pie  by  {U  {i),d  (/)},  =  i 

II:   For  each  bootstrap  sample  h  e  [  1 ,  A^] ,  find 

ArgMin^,{J(d'' -f{il\  Ti)''))}  (32) 
III:  Estimate  the  standard  error  of  the  /  th  prediction  as 

^  -  2 

=  -J—  2  (fiii^,  ^) -yd  ,  (33) 

where 

yi  =  \i:M^'^)  (34) 

A^,=  i 

On  the  other  hand,  one  can  consider  the  predictors  as  fixed,  treat  the  model  residuals 
d-p  as  the  sampling  units,  and  create  a  bootstrap  sample  by  adding  residuals  to  the 
model  fit  p .  This  is  called  the  "bootstrap  residuals"  approach: 

I:    Find  ArgMin^{J(d -f(k,'i^))}  from  the  A^  training  observations 
{ii(i),kO}f=\  and  let       =  d(i) -f(tl{i),^) 
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II:   Generate      bootstrap  samples,  each  one  of  size  A'^  drawn  with 

b  ^ 

replacement  from  { ^(/) },  =  i ,  and  the  b  th  sample  by  { >  (z) }(  =  i  let- 


ting 


/  -  m,^)^>\i))t\  (35) 


III:  For  each  bootstrap  sample  6  e  [1,  A^],fmd 

ArgMin^,{JO^-f{ti,^^))}  (36) 
IV:  Estimate  the  standard  error  of  the  /  th  prediction  as 

^  -  2 

=  -^Z(/;(^'^)->',)  (37) 

7r-i,  =  , 

Note  that  both  of  these  methods  require  fitting  a  model  (retraining  the  network) 

times.  Typically  A'  is  in  the  range  20  <A^  <  200 .  In  simple  linear  least  squares  regres- 
sion, it  can  be  shown  that  the  bootstrap  methods  both  agree  with  the  standard  least  squares 

formula  as  TV*     oo . 

The  bootstrap  methods  will  arrive  at  confidence  intervals 

yi  -  ^conj^^iyt)  ^  yi  ^  y-t  -  ^conP^^^ '  (38) 

where  c^^,,y-  depends  on  the  desired  confidence  level  1  -  a .  The  factor  c^^^y  can  be  taken 
from  a  table  with  the  percentage  points  of  the  Student's  /  -distribution  with  the  number  of 
degrees  of  freedom  equal  to  the  number  of  bootstrap  runs  A^ . 
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2.3  Control  Theory 

An  ANN  is  capable  of  modeling  any  process  making  them  ideal  candidates  for  com- 
plex process  optimization  and  control  strategies.  Neurocontrol  is  but  a  sub-field  of  classi- 
cal control  theory  [31].  To  put  neurocontrol  in  perspective,  it  is  important  to  consider  its 
place  within  this  field. 

2.3.1  Classical  Control  Theory 

Classical  control  theory  is  strongly  biased  towards  linear  time-invariant  systems  [31]. 

General  nonlinear  systems  simply  do  not  allow  us,  because  of  their  analytical  intractabil- 
ity, to  formulate  a  theory  that  is  as  strong  as  that  of  linear  system  theory.  On  the  other 
hand,  nonlinear  systems  can  be  qualitatively  similar  to  linear  ones  under  some  circum- 
stances. 

2.3.1.1  Linear  control 

Linear  control  is  concerned  with  systems  of  the  form 

X  =  A^  +  iii  (39) 
with  state  k ,  input  6 ,  measurable  output 

P  =  t^^,  (40) 

and  controllers  of  the  form 

ii'  =  -P^  +  ^^*,  (41) 

where  k*  is  the  reference  state,  that  is,  the  state  to  which  the  plant  is  to  be  brought  with  the 

help  of  the  controller  [33]. 

The  goals  of  linear  control  are: 

1)  Altering  the  closed-loop  behavior  of  the  system  to  some  user-defined 
response  characteristics. 
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2)  Controlling  the  closed-loop  stability,  i.e.,  convergence  back  to  an  equi- 
librium point  after  disturbance. 

The  disadvantages  of  linear  control  designs: 

1)  Assume  that  the  world  is  linear  Gaussian  and  stationary,  when  in  reality 
the  world  is  none  of  the  above. 

2)  Require  complete  a  priori  knowledge  of  the  process  dynamics. 

3)  Require  that  the  process  is  controllable  and  observable. 

4)  Cannot  follow  a  reference  trajectory  produced  by  a  system  of  lower 
order  that  the  process. 

2.3.1.2  Robust  control 

Robust  control  addresses  the  problem  of  controlling  a  plant  whose  behavior  is  slightly 

different  fi-om  that  of  a  plant  model  [37].  The  reasons  for  the  difference  are  predominately 
the  effect  of  the  nonlinear,  non-Gaussian  and  nonstationary  world.  A  popular  pragmatic 
classical  approach  to  robust  control  is  concerned  with  preserving  stability  [2].  The  closed- 
loop  eigenvalues  are  chosen  so  that  they  remain  in  the  stability  region  even  if  the  plant 
model  should  change  in  a  defined  range. 

Although  robust  control  strategies  are  primarily  designed  to  compensate  for  differ- 
ences between  our  linear  time-invariant  assumptions  and  the  real  world,  they  are  still 
developed  based  predominately  on  linear  system  theory.  They  are  therefore,  not  able  to 
cope  with  significant  deviations  from  these  assumptions. 

2.3.1.3  Adaptive  control 

Adaptive  control  is  another  way  to  reach  a  goal  similar  to  that  of  robust  control  [37]. 

Instead  of  designing  robust  controllers  that  work  under  conditions  different  fi-om  those  for 
which  they  have  been  designed,  adaptive  controllers  recognize  the  difference  between  the 
assumption  and  reality  and  change  to  perform  better  in  the  new  conditions. 
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Adaptation  schemes  can  be  based  on  both  a  reference  model  and  a  cost  functional  [2]. 
The  approach  called  model  reference  adaptive  control  (MRAC)  is,  by  its  name,  committed 
to  the  former.  This  approach  is  based  on  formulating  the  rules  for  computing  the  direction 
of  change  of  controller  parameters  as  a  function  of  the  difference  between  the  behavior  of 
the  closed-loop  system  and  a  reference  model.  Controller  parameters  can  be  adapted  either 
directly  or  via  the  estimation  of  plant  model  parameters. 

A  more  general  approach  is  that  of  self-tuning  regulators  (STR)  [3]  which  consists  in 
adaptive  estimation  of  a  plant  model  and  applying  a  formalized  controller  design  method 
to  the  plant  model.  This  design  method  can  be  based  on  cost  function  optimization. 

Like  robust  control,  however,  adaptive  control  implementations  have  been  based  on 
linear,  or  simple  nonlinear  parametric  assumptions,  about  the  process.  As  a  result,  adap- 
tive control  designs  have  not  demonstrated  significant  successful  with  complex  real-world 
processes. 

2.3.1.4  Nonlinear  control 

Nonlinear  control  theory  is  concerned  with  general  systems  of  the  form  [31] 

i  =  /(i,  ii)  (42) 

with  measurable  output 

P  =  g(^)  (43) 

and  controllers  of  the  form 

=  c(i,i*).  (44) 
The  general  formulation  of  nonlinear  control  holds  promise  to  overcome  all  of  the  lim- 
itations of  the  classical  control  schemes  presented  above.  The  fields  track  record,  how- 
ever, does  not  deliver  on  this  promise.  The  problem  is  that  an  analytical  solution  is  known 
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only  for  a  restrictive  subclass  of  nonlinear  systems.  The  difficulties  with  genuine  nonlin- 
ear controller  designs  have  typically  lead  to  a  linearization  approach,  also  known  as  gain 
scheduling. 

2.3.1.5  Optimal  control 

The  topic  of  optimal  control  theory  is  to  design  controllers  that  are  optimized  to  a  cer- 
tain performance  criterion  [15].  Classical  optimal  control  has  primarily  focused  on  appli- 
cations where  such  optimality  could  be  proven  analytically.  For  example,  for  a  linear  plant 
and  a  quadratic  performance  criterion  the  Ricatti  controller  represents  an  explicit  and  glo- 
bal solution  [31].  Dynamic  optimization  provides  another  example  for  state  evaluation  and 
selection  of  the  optimal  action,  which  can  be  proven  optimal  in  certain  applications.  Alter- 
natively, if  each  state  at  each  sampling  period  is  represented  by  a  node  in  a  directed  graph 
and  actions  are  represented  by  connecting  edges  of  the  subsequent  states,  then  the  task  can 
also  be  transformed  to  the  critical  graph  problem  of  graph  theory  [15]. 

In  its  most  general  form,  the  optimal  control  problem  can  be  formulated  as  an  optimi- 
zation problem  [15].  The  plant  can  be  generalized  as  in  (42)  and  the  goal  is  to  find  a  con- 
troller described  by  (44)  that  minimizes 

E[J(k*-k)],  (45) 
where  E[  ]  is  the  mean  value  over  time. 

2.3.2  Neurocontroi 

Most  neurocontroi  architectures  are  either  exphcit  or  disguised  analogies  of  classical 
control  design  such  as  optimal  control  or  numerical  lyapunov-function-based  design 
methods.  It  has  been  argued  [14]  that  it  is  only  the  representation  of  functions  by  neural 
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networks  that  defines  the  field  of  neurocontrol  in  the  broad  sense  and  differentiates  neuro- 
control  form  classical  control  methods. 

The  author  agrees  that  the  use  of  nonparametric  models  does  differentiate  neurocon- 
trol from  classical  controls,  but  would  argue  that  the  primary  departure  from,  and  exten- 
sion to  the  potential  of,  control  theory  is  neurocontrol 's  willingness  to  depart  from  a 
requirement  for  analytic  solutions.  Neurocontrol  designs  seek  to  realize  the  promise  of 
general  nonlinear  control  replacing  analytic  optimization  methods  with  numeric  ones. 
Researchers  in  the  field  of  neural  networks  are  accustomed  to  working  in  an  intractable 
world,  and  have  been  willing  to  resolve  important  questions  like  stability,  robustness  and 
consistency  empirically.  The  resuh  has  been,  and  will  continue  to  be  an  important  exten- 
sion to  control  theory. 

There  are  many  types  of  neurocontrol  architectures  in  the  literature,  each  with  specific 
advantages  and  disadvantages.  The  following  sections  review  some  of  the  conceptual  neu- 
rocontrol strategies  which  have  been  proposed  in  the  literature. 

2.3.2.1  Model-predictive  control 

When  a  model  is  used  indirectly  and  offline  the  control  scheme  is  usually  referred  to 

as  model-predictive  control  (MPC)  [37].  In  most  industrial  process  control  applications  a 
priori  knowledge  about  the  process  is  hard  to  obtain  and  black-box  models  must  be  used. 
The  offline  training  phase  performs  supervised  learning  to  develop  an  ANN  model  for  the 
process  to  be  controlled,  i.e.,  the  ANN  attempts  to  mimic  the  process  after  being  exposed 
to  actual  process  data.  This  phase  can  be  stated  as 

ArgMin^{J{f-f{!f,^))},  (46) 
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where  ^  and  if  are  the  process  outputs  to  be  controlled  and  inputs  to  be  manipulated, 
respectively. 

At  the  online  control  phase,  the  ANN  model  cannot  be  used  alone;  it  must  be  incorpo- 
rated with  a  model-based  control  scheme  [31].  This  control  scheme  is  once  again  an  opti- 
mization problem,  which  can  be  stated  as 

ArgMin^{J{P* -f(if,  R)))} ,  (47) 

where  y*  e  9^"  denotes  the  desired  closed-loop  process  output.  This  optimization  is  per- 
formed repeatedly  at  each  time  interval  during  the  course  of  feedback  control. 

2.3.2.2  Model-inverse  control 

An  ANN  can  be  trained  to  develop  an  inverse  model  of  a  process  [40].  Here,  the 

model's  input  is  the  process  output,  and  the  model's  output  is  the  process  input.  The 
offline  training  phase  can  be  stated  as 

ArgMin^{J(tiP-f(f,^))}.  (48) 
Clearly,  the  inverse  model  is  a  steady-state  model  or  the  resulting  controller  would  be 

non-causal.  Given  a  desired  process  setpoint  p*  ,  the  appropriate  online  control  signal  if 
can  be  immediately  calculated  as 

if=f(P*,^).  (49) 
Successful  applications  of  inverse  modeling  are  discussed  in  [40]  and  [58].  Obviously, 
an  inverse  model  exists  only  when  the  process  behaves  monotonically  as  a  "feed-forward" 
function  at  steady-state.  If  not,  this  approach  is  not  applicable. 
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2.3.2.3  Controller  modeling 

Another  simple  direct  neurocontrol  scheme  is  to  use  a  neural  network  to  model  an 

existing  controller.  The  input  to  the  existing  controller  is  used  as  training  input  to  the  ANN 
model,  and  the  controller  output  serves  as  the  desired  response.  This  approach  is  similar  to 
the  model-inverse  control  except  that  the  desired  response  here  is  not  a  process  but  a  con- 
troller. This  approach  can  be  formulated  as 

ArgMin^{J{ii'-f{^,^))},  (50) 
where  if  are  the  decision  variables  generated  by  an  existing  controller  in  response  to  the 
plant  states  ^ . 

Like  a  process,  a  controller  is  generally  dynamic  and  often  comprises  integrators  or 
differentiators.  If  an  algebraic  feed-forward  network  is  used  to  model  the  existing  control- 
ler, dynamic  information  must  be  explicitly  provided  as  input  to  the  ANN  model. 

In  general,  this  approach  can  result  in  controllers  that  are  faster  and/or  cheaper  than 
traditional  controllers.  Using  this  approach,  for  example,  Pomerleau  [50]  presented  an 
intriguing  application  where  a  neural  network  was  used  to  replace  a  human  operator,  i.e., 
an  existing  controller. 

2.3.2.4  Model-free  direct  control 

Without  an  existing  controller  or  process  knowledge,  controllers  have  to  be  adapted  or 
learn  the  way  a  human  operator  learns  to  control/operate  a  process  for  the  first  time.  A 
model-free  neurocontrol  design  objective  can  be  stated  as 

ArgMin^AJiy*  -/),  ^  =/(^,  ^')} ,  (51) 
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where  f  is  an  ANN  that  is  directly  controlHng  the  process  inputs,  and      are  the  weights 

of  this  network.  Notice  that  the  optimization  criterion  J  is  only  a  function  of  the  actual 
and  desired  process  outputs.  This  means  that  the  optimization  methodology  employed 

must  be  able  to  learn  without  an  explicit  desired  response  or  even  a  mathematical  link- 
age to  the  criterion. 

The  key  feature  of  this  direct  adaptive  approach  is  that  a  process  model  is  neither 
known  in  advance  nor  explicitly  developed  during  control  design.  This  most  common 
learning  algorithm  for  this  type  of  control  design  is  referred  to  as  reinforcement  learning. 
The  first  work  in  this  area  was  the  "adaptive  critic"  algorithm  proposed  by  Barto  et  al.  [7]. 
Such  an  algorithm  can  be  considered  as  an  approximate  version  of  dynamic  programming 
[73][8],  later  coined  as  Neuro-Dynamic  Programming  [12]. 

Despite  its  historical  importance  and  intuitive  appeal,  model-free  adaptive  neurocon- 
trol  is  not  appropriate  for  most  real  world  applications.  The  plant  is  most  likely  out  of  con- 
trol during  the  learning  process,  and  few  industrial  processes  can  tolerate  the  large  number 
of  failures  required  to  adapt  the  controller. 

2.3.2,5  Model-reference  direct  control 

From  a  practical  perspective,  one  would  prefer  to  let  failures  take  place  in  a  simulated 
environment  with  a  process  model  rather  than  in  a  real  plant.  Even  if  failures  are  not  disas- 
trous they  can  cause  substantial  losses.  The  performance  of  a  controller  could  be  evaluated 
based  on  a  model  for  the  process,  rather  than  the  process  itself.  The  training  stage  of  the 
control  design  can  be  given  by 

ArgMin^„{J{^^  -f(ii\  ^"'))} ,  (52) 
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and  the  control  design  becomes 

ArgMin^AJiP*  -f(d\  ^"')),  it'  =/(^,  ^')} .  (53) 

In  the  course  of  modeUng  the  plant,  the  plant  must  be  operated  "normally"  instead  of 
being  driven  out  of  control.  After  the  modeling  stage,  the  model  can  be  used  for  controller 
design.  If  a  process  model  is  already  available,  an  ANN  controller  can  be  developed  in  a 
simulation  in  which  failures  cannot  cause  any  loss  but  that  of  computer  time.  A  neural  net- 
work controller  after  extensive  training  in  the  simulation  can  then  be  installed  in  the  actual 
control  system. 

Model-Reference  direct  control  schemes  have  not  only  proven  effective  in  several 
studies  [41]  [63],  but  have  also  already  produced  notable  economic  benefits  [60].  These 
approaches  can  be  used  for  both  off-line  control  and  for  on-line  adaptation. 

2.4  NOx 

The  Clean  Air  Act  Amendments  of  1990  require  that  electric  utilities  make  significant 
reductions  in  nitrogen  oxide  (NOx)  emissions  from  their  fossil-fired  power  plants.  To 
date,  most  efforts  to  reduce  NOx  emissions  have  come  from  expensive  hardware  retrofits 
with  less  than  satisfactory  performance.  Further  complicating  matters,  conditions  that 
decrease  NOx  formation  (lower  temperature,  excess  fuel)  result  in  the  formation  other  pol- 
luting compounds,  mainly  carbon  monoxide  (CO).  Similar  emissions  reductions  are  being 
required  in  Europe  through  local  and  European  Economic  Community  (ECC)  initiatives. 

Nitrogen  monoxide  (NO)  and  nitrogen  dioxide  (NO2)  are  by-products  of  the  combus- 
tion process  of  virtually  all  fossil  fuels.  Historically,  the  quantity  of  these  inorganic  com- 
pounds in  the  products  of  combustion  was  not  sufficient  to  affect  boiler  performance;  their 
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presence  was  largely  ignored.  In  recent  years,  oxides  of  nitrogen  have  been  shown  to  be 
key  constituents  in  the  complex  photochemical  oxidant  reaction  with  sunlight  to  form 
smog.  Today,  the  emission  of  NO2  and  NO  (collectively  referred  to  as  NOx)  is  regulated 
by  the  1990  Clean  Air  Act  Amendments  and  has  become  an  important  consideration  in  the 
design  of  fuel  firing  equipment. 

NOx  is  formed  by  two  primary  mechanisms:  thermal  NOx  and  fuel-bound  NOx.  Ther- 
mal NOx  formation  occurs  only  at  high  flame  temperatures  when  dissociated  nitrogen 
from  combustion  air  combines  with  oxygen  atoms  to  produce  oxides  of  nitrogen  such  as 
NO  and  NO2.  The  formation  of  thermal  NOx  increases  exponentially  with  combustion 
temperature  and  increases  by  a  square-root  relationship  with  the  presence  of  oxygen  in  the 
combustion  zone.  Fuel-bound  NOx  formation  is  not  limited  to  high  temperatures,  but  is 
dependent  upon  the  nitrogen  content  of  the  fuel.  The  best  way  to  minimize  NOx  formation 
is  to  reduce  flame  temperature,  reduce  excess  oxygen,  and/or  to  bum  low  nitrogen-con- 
taining fuels.  Conditions  that  decrease  NOx  formation  (lower  temperature,  excess  fuel) 
can  result  in  incomplete  combustion.  These  conditions  resuU  in  the  formation  other  pollut- 
ing compounds,  mainly  carbon  monoxide  (CO). 

2.4.1  Reduction 

The  available  NOx  reduction  technologies  can  be  categorized  into  one  of  the  follow- 
ing: 

•  Before  Combustion:  Nitrogen  is  extracted  from  the  fuel.  This  is  relatively  inef- 
fective, since  most  of  the  nitrogen  in  the  formation  of  NOx  comes  from  the  air 
(containing  N2). 
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•  After  Combustion:  NOx  is  chemically  reduced  before  leaving  the  stack.  This  pro- 
cess is  also  expensive,  requiring  hardware  retrofits. 

•  During  Combustion:  Altering  fuel  and  air  flows  and  introducing  them  at  different 

points  of  the  furnace  can  create  several  zones  with  different  temperatures  and 
stoichiometry.  These  parameters  significantly  effect  the  rate  of  NOx  formation. 

The  following  section  reviews  available  NOx  reduction  strategies  and  technologies  for 
combustion  sources. 

•  Fuel  Switching:  Fuel-bound  NOx  formation  is  most  effectively  reduced  by 
switching  to  a  fuel  with  lower  nitrogen  content.  No.  6  fuel  oil  or  another  residual 
ftiel  having  a  relatively  high  nitrogen  content  can  be  replaced  with  No.  2  ftiel  oil, 
another  distillate  oil  or  natural  gas  (which  is  essentially  nitrogen-free)  to  reduce 
NOx  emissions. 

•  Flue  Gas  Recirculation  (FGR):  Flue  gas  recirculation  involves  extraction  of  some 

of  the  flue  gas  from  the  stack,  and  recirculation  with  the  combustion  air  supplied 
to  the  burners.  The  process  reduces  both  the  oxygen  concentrations  at  the  burn- 
ers and  the  temperature  by  diluting  the  combustion  air  with  flue  gas.  CO  can 
become  a  significant  problem  here. 

•  Low  NOx  Burners:  Installation  of  burners  especially  designed  to  limit  NOx  for- 
mation can  reduce  NOx  emissions.  Higher  reduction  efficiencies  can  be 
achieved  by  combining  a  low  NOx  burner  with  FGR.  Low  NOx  burners  are 
designed  to  reduce  the  peak  flame  temperature  by  inducing  recirculation  zones, 
staging  combustion  zones,  and  reducing  local  oxygen  concentrations. 


43 

Derating:  Some  industrial  boilers  may  be  derated  to  produce  a  reduced  quantity 
of  steam  or  hot  water.  Derating  will  decrease  the  flame  temperature  within  the 
unit,  reducing  formation  of  thermal  NOx.  Derating  can  be  accomplished  by 
reducing  the  firing  rate  or  by  installing  a  permanent  restriction,  such  as  an  orifice 
plate,  in  the  fuel  line.  Clearly  this  solution  would  have  significant  economic 
impact  on  the  unit. 

Steam  or  Water  Injection:  By  injecting  a  small  amount  of  water  or  steam  into  the 
immediate  vicinity  of  the  flame,  the  flame  temperature  will  be  lowered  and  the 
local  oxygen  concentration  reduced.  The  result  would  be  to  decrease  the  forma- 
tion of  thermal  and  ftiel-bound  NOx.  This  process  generally  lowers  the  combus- 
fion  efficiency  of  the  unit  by  one  or  two  percent. 

Staged  Combustion:  Either  air  or  fuel  injecfion  can  be  staged,  creating  either  a 
fuel-rich  zone  followed  by  an  air-rich  zone,  or  an  air-rich  zone  followed  by  a 
fuel-rich  zone.  A  low  NOx  burner  utilizing  staged  combustion  can  be  installed, 
or  the  furnace  itself  can  be  retrofitted  for  staged  combustion. 

Fuel  Rebuming:  Staged  combustion  can  be  achieved  through  the  fuel  rebuming 
process.  A  Gas  Rebuming  Zone  (GRZ)  is  created  above  the  primary  combustion 
zone.  In  the  GRZ,  addifional  natural  gas  is  injected,  creating  a  fiiel-rich  region 
where  hydrocarbon  radicals  react  with  NOx  to  form  molecular  nitrogen. 

Reduced  Oxygen  Concentration:  Decreasing  excess  air  reduces  the  oxygen  avail- 
able in  the  combustion  zone  and  lengthens  the  flame,  resulting  in  a  lower  heat 
release  rate  per  unit  flame  volume.  NOx  emissions  are  reduced  in  an  approxi- 
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mately  linear  fashion  with  decreasing  excess  air.  However,  as  excess  air  is 
reduced  beyond  a  threshold  value,  combustion  efficiency  will  decrease  due  to 
incomplete  mixing,  and  CO  emissions  will  increase. 

Selective  Catalytic  Reduction  (SCR):  Selective  catalytic  reduction  (SCR)  is  a 
post-formation  NOx  control  technology  that  uses  a  catalyst  to  facilitate  a  chemi- 
cal reaction  between  NOx  and  ammonia  to  produce  nitrogen  and  water.  An 
ammonia/air  or  ammonia/steam  mixture  is  injected  into  the  exhaust  gas,  which 
then  passes  through  a  catalyst  where  NOx  is  reduced.  To  optimize  the  reaction, 
the  temperature  of  the  exhaust  gas  must  be  in  a  certain  range  when  it  passes 
through  the  catalyst  bed.  Among  its  disadvantages,  SCR  requires  additional 
space  for  the  catalyst  and  reactor  vessel,  as  well  as  ammonia  storage,  distribu- 
tion, and  injection  system.  Precise  control  of  ammonia  injection  is  critical.  An 
inadequate  amount  of  ammonia  can  result  in  unacceptable  high  NOx  emission 
rates,  while  excess  ammonia  can  lead  to  ammonia  "slip",  or  the  venting  of  unde- 
sirable ammonia  to  the  atmosphere. 

Selective  Non-Catalytic  Reduction  (SNCR):  Selective  non-catalytic  NOx  reduc- 
tion involves  injection  of  a  nitrogenous  agent,  such  as  ammonia  or  urea,  into  the 
flue  gas.  The  optimum  injection  temperature  when  using  ammonia  is  1850 
degrees  F,  at  which  60  percent  NOx  removal  can  be  approached.  The  optimum 
temperature  range  is  wider  when  using  urea.  Below  the  optimum  temperature 
range,  ammonia  is  formed,  and  above,  NOx  emissions  actually  increase.  The 
success  of  NOx  removal  depends  not  only  on  the  injection  temperature,  but  also 


on  the  ability  of  the  agent  to  mix  sufficiently  with  flue  gas. 
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2.5  Fossil-Fired  Power  Generation 

In  general  Canal  Unit  2  is  a  large  fossil  fuel  combustion  engine.  From  an  abstract  per- 
spective, the  combustion  process  takes  in  air  and  fuel,  and  produces  energy  and  exhaust; 
as  described  by: 

1)  Air:  Fossil  fuel  combustion  requires  air,  or  more  specifically  the  oxygen 
contained  in  air.  Subsystems  within  the  plant  measure,  prepare  and 
introduce  this  air. 

2)  Fuel:  Combustion  also  requires  fuel.  In  the  case  of  Canal  Unit  2,  the  fuel 
can  be  either  #6  residual  oil  (leftover  from  the  refining  process)  or  natu- 
ral gas.  Canal  Unit  2  can  fire  oil  only,  gas  only  or  a  mixture  of  both. 
Both  fuels  must  be  measured,  prepared  and  introduced  to  the  furnace. 

3)  Energy:  The  energy  released  by  the  oxidation  of  fossil  fuels  during  com- 
bustion is  used  to  make  steam.  The  properties  of  water  still  make  it  the 
best  choice  when  converting  thermal  energy  to  work.  Canal  uses  the 
radiative  and  convective  heat  from  the  combustion  process  to  transform 
ultra-clean  water  into  superheated  steam.  The  expansion  of  this  steam  is 
used  to  turn  a  turbine  that  turns  a  coil  in  a  magnetic  field,  producing 
electric  potential.  The  steam,  having  done  this  work  flows  through 
ocean  water  filled  condensers  that  convert  it  back  to  super-clean  water. 

4)  Exhaust:  The  gaseous  products  of  combustion  having  contributed  much 
of  their  heat  content  to  the  production  of  steam  are  cleaned  electrostati- 
cally and  ejected  into  the  atmosphere. 

2.5.1  Process  Variables 

The  specific  process  variables  as  they  apply  to  the  Canal  generating  unit  are  described 

in  more  detail  in  the  following  sections.  These  variables  are  also  listed  in  Appendix,  and 
will  be  referred  to  throughout  this  work. 

2.5.1.1  Air 

The  air  required  for  fossil  fuel  combustion  is  prepared  and  introduced  in  two  ways. 
Two  large  symmetrical  Fans  called  Forced  Draft  fans  push  ambient  air  through  a  series  of 


46 

preheaters  that  warm  this  air  to  between  80  and  1 80  degrees  F.  This  hot  oxygen  rich  air  is 
then  pressed  into  a  windbox  that  surrounds  the  furnace  enclosing  the  burner  ports. 
Through  an  array  of  vents  called  Primary  and  Secondary  Air  Shrouds  around  each  burner 
and  through  secondary  ports  called  Overfire  Air  Ports  this  pressurized  air  is  vented  into 
the  combustion  zone.  In  addition  to  this  oxygen  rich  air,  Canal  Unit  2  has  the  ability  to 
recycle  exhaust  gas  into  the  combustion  zone  through  a  Gas  Recirculation  System. 

The  measurements  of  all  this  air  are  a  function  of  boiler  design,  and  fan  capacities.  To 
increase  the  output  of  the  engine,  additional  air  must  be  throttled  through  these  devices. 

2.5.1.1.1  Forced  draft  system 

The  forced  draft  fans  are  2500  horsepower,  624,000  cfm  centrifugal  fans  with  inlet 

vane  throttles.  They  are  constant  speed  fans  meaning  that  the  fan  shaft  turns  at  a  constant 
speed  while  more  or  less  air  with  more  or  less  initial  spin  can  be  dumped  into  the  blades 
by  opening  or  closing  the  vanes.  If  the  vanes  are  only  slightly  opened  the  flow  volume  of 
air  available  to  the  fans  is  small,  and  it  takes  less  work  to  move  it.  Fan  amps  will  be  corre- 
spondingly low.  If  the  inlet  vanes  are  opened  wider  the  flow  is  greater.  Still  the  fan  moves 
at  a  constant  speed.  More  work  is  being  done,  and  the  amperage  must  increase.  The  output 
of  the  FD  fans  is  derived  from  the  boiler  master  signal.  Forced  draft  output  is  specified 
along  with  ftiel  flow  by  the  fiiel-air  curve  of  the  boiler.  The  fuel-air  curve  gives  a  total  air 
flow  requirement,  as  well  as  a  total  ftiel  flow  requirement  for  a  given  load. 

These  fans  are  symmetrical  to  the  furnace  like  many  other  systems  and  they  operate 
symmetrically,  through  their  respective  ducts  unless  biased.  Bias  represents  an  addition  or 
subtraction  of  signal  to  the  B  side  FD  fan.  These  fans  can  also  be  trimmed  to  meet  slightly 
less  or  slight  more  than  the  Total  Air  Flow  demanded  by  the  fuel-air  curve  of  the  Boiler. 
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The  FD  fans  are  the  principal  air  throttles  of  the  Boiler  and  so  have  a  fundamental  effect 
on  nearly  every  other  system. 

2.5.1.1.2  Forced  draft  fan  inlet  vanes 

Since  the  inlet  vanes'  positions  represent  the  work  being  done  by  the  fan  and  are  the 

control  most  familiar  to  the  operators,  these  tags  were  used  to  represent  the  FD  fans. 

2.5.1.1.3  02  trim 

This  tag  represents  the  bias  that  operators  set  into  the  airflow  demand  predetermined 
by  load.  Functionally  this  control  trims  the  response  of  the  FD  Fan  to  Air  Demand.  This 
tag  gives  the  operators  the  ability  to  run  the  furnace  slightly  lean  or  rich  overall. 

2.5.1.1.4  Induced  draft  system 

As  mentioned  in  the  section  on  the  Forced  Draft  Fans,  the  function  of  the  ID  Fans  is  to 

take  whatever  gasses  are  present  in  the  furnace,  including  air  that  has  been  introduced  by 
the  load  following  Forced  Draft  Fans,  plus  all  products  of  combustion,  and  pull  them  out, 
maintaining  a  constant  under  pressure  in  the  furnace  of  -.5  inches  of  water  column.  The 
FD  Fans'  speed  is  kept  constant  while  the  volume  of  air  they  move  is  throttled  with  inlet 
vane  controls.  Canal  is  limited  by  the  power  of  these  fans.  Current  unit  maximum  output  is 
frequently  limited  by  the  power  of  these  fans  to  keep  up  with  the  increased  air  flows  of  the 
recently  installed  low  NOx  shroud  and  overfire  air  system. 

2.5.1.1.5  Induce  draft  fan  inlet  vanes 

The  induced  draft  fan  inlet  vanes  are  the  inlet  throttles  to  the  fans,  they  open  in 

response  to  request  for  increased  output  and  as  in  the  case  of  the  FD  fans,  represent  the 
work  being  done. 
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2.5.1.1.6  Combustion  air  temps 

The  combustion  air  temperature  tags  represent  the  temperature  of  the  incoming  air 

after  the  FD  Fans.  The  temperature  of  this  air  is  a  direct  result  of  energy  added  to  ambient 
air  by  the  Glycol  Air  Preheater  (GAH),  and  the  Combustion  Air  Preheater  (CAH).  Since 
density  is  a  function  of  temperature,  the  temperature  of  this  air  can  impact  the  combustion 
process  that  is  sensitive  to  the  Oxygen  content  of  air  as  well  as  the  operation  of  other  vol- 
umetric systems  like  the  Induced  Draft  (ID)  Fans.  It  also  has  a  primary  impact  on  exhaust 
gas  temperature  and  resultant  stack  gas  velocity. 

2.5.1.1.7  Primary  air  shrouds 

The  Primary  Air  (PA)  Shrouds  represent  the  circular  articulating  vents  that  surround 

the  individual  burner  orifices.  These  are  closest  to  the  fuel  gun  concentrically  inside  of  the 
Secondary  Air  (SA)  Shrouds.  They  are  responsible  for  supplying  primary  combustion  air 
to  the  flame  front.  These  tags  represent  actuator  positions. 

The  PA  shrouds  are  controlled  by  the  Burner  Management  System  (BMS)  and  they 
move  as  a  group  from  minimum  position  (5%  open  to  provide  cooling  air)  toward  open  as 
load  increases.  The  signal  that  controls  them  is  called  the  Primary  Air  Master  Demand 
(PAMD).  Separate  PAMD  signals  exist  for  fuel  gas  primary  air  demand  and  for  fuel  oil 
primary  air  demand.  Each  specific  burner  effectively  listens  to  the  current  fuel  state.  Hav- 
ing received  this  signal  each  burner's  own  PA  shrouds  responds  to  the  PAMD  in  accor- 
dance with  one  of  two  ftinctions  that  are  unique  to  it  -  a  Burner  Primary  Air  Shroud 
Function  for  oil  operation  and  a  Burner  Primary  Air  Shroud  Function  for  gas  operation. 
The  correct  unique  local  shroud  function  is  changed  according  to  the  correct  master  signal 
depending  on  the  fuel  state  of  the  burner.  These  burner  and  fuel  specific  response  func- 
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tions  were  set  up  to  give  roughly  appropriate  air  flow  to  combustion  at  all  load  points  and 
fuel  states  based  on  the  air  flow  inherent  to  the  furnace. 

Aside  from  normal  operation  the  PA  shrouds  can  be  biased  from  the  fuel  specific  mas- 
ter signal  or  on  an  individual  basis  from  their  respective  unique  functions. 

2.5.1.1.8  Secondary  air  shrouds 

The  secondary  air  shroud  tags  represent  the  broadcast  actuator  positions  of  the  second, 

outer  concentric  set  of  circular  articulating  vents  that  surround  the  individual  burner  ori- 
fices. 

The  first  function  of  the  Secondary  Air  (SA)  Shrouds  is  to  introduce  combustion  air  to 
the  flame  front  following  load.  Their  second  function  is  to  balance  windbox  pressure,  and 
therefore  total  airflow,  against  the  actuation  of  the  Overfire  Air  Ports  and  the  PA  shrouds. 

The  SA  Shrouds  have  a  master  signal  against  which  a  master  bias  can  be  set.  In  addi- 
tion they  have  individual  actuating  functions  and  individual  biases  that  can  be  set  against 
these  individual  functions. 

2.5.1.1.9  Over  fire  air  ports 

The  Overfire  Air  (OF A)  ports  are  rectangular  louvered  ports  that  pass  combustion  air 

from  the  Windbox  to  the  Furnace  above  the  top  burner  level.  In  doing  this  they  re-oxygen- 
ate the  oxygen  depleted  flame  front.  The  tags  themselves  represent  the  positions  broadcast 
from  the  actuators  that  control  the  articulating  louvers. 

The  OFA  ports  were  installed  as  a  part  of  the  low  NOx  retrofit  of  1996.  The  Forney 
low  NOx  burner  system  is  designed  to  bum  more  coolly  and  incompletely  than  normal. 
NOx  formation  has  been  positively  linked  with  time  exposure  to  higher  temperatures. 
After  partial  combustion  has  taken  place,  low  in  the  flame  front,  extra  oxygen  rich  com- 
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bustion  air  is  introduced  through  the  OA  ports  to  complete  the  process.  In  this  way  the  low 
NOx  burner  system  stages  off-stoichiometric  combustion  to  manage  combustion  products. 

The  OA  port  actuators  receive  their  master  signal  from  load.  This  signal  can  be  biased. 
Each  actuator's  response  is  based  on  a  unique  function  that  was  parametrically  deter- 
mined, in  concert  with  the  Primary  and  Secondary  Air  Shrouds  during  installation  to  give 
best  airflow  to  combustion  at  all  load  points. 

2.5.1.1.10  Air  preheater  temps 

These  represent  the  temperature  of  the  exhaust  gasses  entering  and  leaving  the  Ijung- 

strom  combustion  air  heat  exchanger.  The  ljungstrom  is  a  large  (30  ft.  dia.)  rotating  wheel, 
arranged  perpendicular  to  the  gas  flow.  It  is  half  enclosed  by  the  exhaust  ducts  and  half 
enclosed  by  fresh  air  ducts.  As  this  wheel  slowly  rotates,  heat  is  absorbed  by  a  given  area 
of  the  wheel  exposed  to  exhaust  gas.  The  absorbed  heat  is  then  imparted  to  the  incoming 
air  while  that  same  section  traverses  the  fresh  air  duct.  Elaborate  seals  and  pressurized 
sealing  air  keep  the  two  gasses  from  mingling  across  this  device. 

The  air  preheater  tags  are  somewhat  redundant.  The  "In  Temps"  represent  the  temper- 
ature of  the  gas  on  its  way  in,  while  the  "Out  Temps"  represent  the  temperature  of  the  gas 
on  the  way  out.  The  heat  exchange  of  the  air  preheater  is  a  function  of  the  device  and  of 
the  temperatures  of  the  two  gasses  and  is  not  controllable  in  the  least.  The  gas  temp  after 
the  air  pre  heater  heat  exchange  was  a  more  familiar  control  to  the  operators,  however  our 
ability  to  collect  these  signals  was  compromised  by  a  failing  thermocouple  during  a  large 
part  of  the  data  collection  for  phase  1.  The  gas  temp  before  the  air  preheater  was  used  to 
represent  exit  gas  temp  for  the  modeling  instead. 
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2,5.1.1.11  Windbox  and  furnace 

The  windbox  is  an  enclosed  volume  that  surrounds  the  waist  of  the  furnace  and  the 

burner  openings.  Preheated,  oxygen  rich  air  is  pressurized  in  this  volume  by  the  FD  fans. 
From  here  this  air  can  pass  only  into  the  furnace  and  only  through  vanes  that  surround  the 
burner  openings  called  primary  and  secondary  sir  shrouds,  or  through  the  overfire  air  ports 
above  the  burners.  Canal  Unit  2  is  a  balanced  draft  furnace  which  means  that  air  flow 
through  the  furnace  is  controlled  around  a  desired  furnace  pressure  by  both  pushing  and 
pulling  fan  systems.  The  pushing  fans  are  the  FD  fans,  while  the  pulling  fans  are  the 
induced  draft  fans.  The  FD  fans  have  the  primary  responsibility  of  getting  the  combustion 
zone  all  the  oxygen  it  requires.  The  introduction  of  this  pressurized  air  is  accomplished  not 
only  by  positively  pressurizing  the  windbox  but  also  by  negatively  pressurizing  the  fur- 
nace. With  the  windbox  driven  to  a  positive  pressure  and  the  furnace  kept  at  a  fixed  rela- 
tive negative  pressure,  the  velocity  of  combustion  airflow  is  assured.  The  induced  draft 
fans  have  primary  responsibility  for  maintaining  the  furnace  at  a  negative  pressure  relative 
to  the  windbox.  In  the  course  of  increasing  unit  output  the  FD  fans  increase  air  flow.  Their 
aim  is  to  maintain  windbox  pressure  at  +2  inches  of  water  column  while  air  transfer  to  the 
furnace  increases  through  the  widening  overfire  ports  and  primary  and  secondary  air 
shrouds.  The  induced  draft  fans,  trying  to  maintain  a  constant  pressure  of -.5  inches  of 
water  column  in  the  furnace  despite  this  increasing  flow  of  air  from  the  windbox,  also 
ramp  up.  The  opposite  happens  for  decreasing  load.  When  the  forced  draft  fans  decrease 
their  output  in  step  with  the  fuel-air  demand,  air  flow  from  the  windbox  to  the  furnace 
decreases.  In  order  to  maintain  a  constant  -.5  inwc  in  the  furnace  the  induced  draft  fans 
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throttled  back.  Transient  changes  in  the  windbox  to  furnace  pressure  differential  can  also 
produce  automated  changes  in  the  FD  and  ID  fan  flows. 

2.5.1.1.12  WindBox  pressure 

This  tags  represents  the  positional  average  windbox  air  pressure.  It  is  controlled 

around  +  2  inwc 

2.5.1.1.13  Furnace  pressure 

This  tag  represents  the  actual  furnace  air  pressure. 

2.5.1.2  Fuel 

The  Fuel  required  for  Combustion  may  be  either  #6  Fuel  Oil  or  Natural  Gas.  In  both 
cases  the  fuel  is  taken  from  storage,  filtered,  heated  to  greater  or  lesser  degree,  pressur- 
ized, and  injected.  In  the  case  of  #6  Fuel  Oil,  the  temperature  required  to  achieve  a  pump- 
able  consistency  is  usually  around  200  degrees.  Natural  Gas  comes  from  high  pressure 
transmission  lines  and  once  stepped  down  to  usable  pressure  is  warmed  up  to  around  80 
degrees  F.  Both  fuels  are  then  pressurized  in  their  respective  headers.  It  is  from  these 
headers  that  burners,  when  they  are  lit,  tap  their  fuel. 

2.5.1.2.1  Burners-on/fuel 

These  tags  represent  the  readings  of  an  array  of  air  cooled  optical  flame  scanners 
located  in  the  furnace  itself  that  observe  the  respective  burner  flames.  Since  each  burner 
can  fire  either  natural  gas  or  fuel  oil.  A  scanner,  calibrated  for  each  fiiel  specific  flame  is 
permanently  assigned  to  each  burner.  Although  these  scanners  are  analog  devices,  their 
primary  function  is  to  confirm  that  the  flame  emanating  from  each  lit  burner  is  of  a  thresh- 
old quality.  If  the  flame  they  are  monitoring  is  not  of  a  threshold  quality  the  scanner  has 
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the  will  to  declare  a  Master  Fuel  Trip  and  cut  off  all  fuel  to  the  furnace.  This  is  to  prevent 
the  introduction  of  unbumed  fuel  to  the  furnace.  These  are  analog  devices  but  because 
they  are  calibrated  with  the  single  purpose  of  either  positively  or  negatively  confirming 
this  threshold  they  essentially  read  either  1  or  0.  This  specific  set  represents  the  flame 
quality  of  its  burner  if  that  burner  is  on  natural  gas. 

2.5.1.2.2  Burner  cells  1-8A  &  1-8B  MN  gas  name 

These  are  the  signals  for  gas  flame  status  for  each  burner. 

2.5.1.2.3  Burner  cells  1-8A  &  1-8B  MN  oil  Hame 

These  are  the  signals  for  oil  flame  status  for  each  burner. 

2.5.1.2.4  Fuel  type 

As  the  Boiler  Master  request  increased  output  BTUs  are  requested  from  the  Fuel  Sup- 
ply Systems.  As  a  default  this  request  is  divided  evenly  in  proportion  to  burners  in  service, 
each  of  which  have  BTU  content  per  unit  of  fuel  settings.  The  total  BTUs  entering  the  fur- 
nace via  the  burners  in  service  must  equal  this  demand. 

2.5.1.2.5  Fuel  oil 

The  fuel  oil  introduction  system  consists  of  a  main  pressure  generating  pump  that 
ramps  up  in  output  as  the  unit  master  demand  requests  more  output  in  the  form  of  BTUs. 
This  pump  supplies  an  operating  pressure  to  the  fuel  oil  header.  All  oil  burners  once  they 
are  lit  and  placed  into  service  tap  a  fixed  orifice  from  this  header.  Since  fuel  oil  pressure  is 
fixed  by  the  number  of  BTUs  requested  by  load,  and  the  orifice  of  each  burner  tip  is  a 
fixed  diameter  if  open,  the  number  of  burners  in  service  will  dramatically  affect  Fuel  Oil 
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Pressure.  Changes  in  the  number  of  burners  Ut  can  vary  the  fuel  oil  pressure  in  the  header 
between  65  and  150  PSIG. 

Fuel  Temp  Fired  must  be  at  least  the  temp  required  for  pumpability,  which  is  specific 
to  the  viscosity  of  the  fuel  oil  being  used. 

2.5.1.2.6  Natural  gas 

In  a  fashion  similar  to  the  fuel  oil  introduction  system,  the  unit  master  demand 

requests  BTUs  from  the  gas  system.  Fuel  gas  from  the  pipeline  is  stepped  down  to  operat- 
ing pressure,  filtered,  warmed  and  supplied  to  a  main  gas  header.  All  gas  burners  when  lit 
tap  a  fixed  orifice  from  this  header.  The  number  of  burners  lit  on  gas  can  affect  the  actual 
gas  pressure  indicated  at  the  header. 

2.5.1.2.7  Burner  atomization 

These  tags  represent  the  essential  fuel  oil  atomizing  steam  parameters.  Atomizing 

steam  is  dry  superheated  steam  extracted  from  the  turbine  or  the  reboiler  and  injected  into 
the  oil  burner  tips  to  atomize  the  ftiel  oil  as  it  is  introduced  to  the  combustion  zone. 

Burner  Atomizing  Steam  pressure  runs  at  a  specified  20psig  over  fuel  oil  pressure. 
Burner  atomizing  steam  flow  is  modulated  to  maintain  this  constant  difference  from  fiiel 
oil  pressure  while  the  actual  temperature  fluctuates  somewhat  at  the  point  of  exfraction. 
pv  =  nrt  connects  these  three  variables  with  temperature  being  somewhat  variable,  flow 
being  the  control,  and  pressure  being  the  set  point. 

2.5.1.2.8  Fuel  oil  /  fuel  gas  How  differential 

This  tag  represents  the  ratio  of  BTUs  contributed  by  the  fuel  oil  system  vs.  the  BTUs 
contributed  by  the  fuel  gas  system  to  the  total  BTUs  required  for  a  given  load. 
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2.5.1.2.9  Energy 

During  operation  at  Canal  Unit  2  feedwater,  pressurized  by  a  large  parasitic  turbine 
driven  pump,  is  circulated  through  series  of  preheaters  and  then  through  the  very  walls  of 
the  furnace.  During  this  passage  it  is  converted  to  steam.  This  steam  is  then  collected  in  a 
pressure  vessel  called  a  Steam  Drum  located  at  the  top  of  the  boiler  where  it  is  "dried". 
From  the  Steam  Drum  this  dry  saturated  steam  is  passed  through  radiator  like  Primary 
Superheater  and  Secondary  Superheaters  that  hang  at  the  top  of  the  furnace  where  convec- 
tive,  and  radiative  heat  transfer  occurs.  From  the  outlet  of  the  Secondary  Superheater  the 
steam  goes  directly  to  the  High  Pressure  inlet  of  the  Turbine.  Unit  2  is  a  single  reheat 
boiler  which  means  that  the  exhaust  from  the  high  pressure  turbine,  instead  of  being  con- 
densed, is  passed  back  to  the  boiler  and  re  superheated.  This  re  superheated  steam  then 
turns  the  Intermediate  and  Low  Turbine  Stages.  Attemperating  sprays  inject  cool  feedwa- 
ter into  the  steam  cycle  between  the  Primary  and  Secondary  Superheaters  and  also  before 
the  Reheat  Superheater.  These  cooling  sprays  dampen  thermal  dynamics  and  keep  steam 
temperature  at  the  turbine  roughly  constant  around  1000  degrees. 

2.5.1.2.10  Generation 

The  Westinghouse  turbine  generator  converts  the  expansion  energy  of  superheated 
steam  to  create  rotational  momentum  in  the  turbine.  This  rotational  energy  is  imparted  to  a 
coil  enclosed  in  an  induced  electromagnetic  field.  The  rotation  of  this  coil  in  this  excited 
field  creates  electric  potential  at  the  ends  of  the  coil.  This  electric  differential  has  roughly 
560  megawatts  of  power  with  which  to  do  work.  Under  normal  operating  conditions,  and 
aside  from  throttling  effects,  the  output  of  the  turbine  generator  is  in  direct  relationship  to 
boiler  output. 
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This  tag  represents  the  actual  instantaneous  unit  output  in  units  of  power. 

2.5.1.2.11  Heat  rate 

This  is  a  simple  calculated  tag  representing  the  sum  of  BTUs  flowing  into  combustion 
from  oil  and  gas  combined  divided  by  the  amount  of  power  created.  It  can  show  the  rela- 
tive efficiency  of  combustion-steam-power  system  in  an  energy  in  vs.  energy  out  relation- 
ship. As  load  increases  heat  rate  decreases  due  to  the  thermal  properties  of  the  steam  loop. 

2.5.1.2.12  Main  steam 

The  main  steam  temperature,  in  concert  with  the  throttle  pressure  is  related  via  steam 

tables  to  volume,  enthalpy  and  entropy  and  describes  the  output  state  of  the  steam  generat- 
ing system.  Unit  2  is  a  sliding  throttle  unit  capable  of  modulated  steam  temp  output  across 
different  throttle  valve  configurations.  Steam  output  is  essentially  controlled  by  flow.  As 
the  unit  ramps  up  in  load,  more  steam  is  generated  from  increased  combustion.  Steam 
temperature  is  held  (roughly)  steady  via  modulation  of  flow  through  the  turbine  throttle 
valves,  which  are  sequentially  opened.  Once  the  unit  reaches  a  certain  level  of  output 
(@480MW)  all  throttle  valves  are  set  in  the  fully  open  position  and  steam  flow  is  modu- 
lated by  continuing  to  increasing  steam  output  through  combustion  throttling.  At  all  levels 
of  output  Steam  temperature  is  controlled  around  1000  degrees  F  for  optimum  turbine 
operation 

2.5.1.2.13  Temperature 

These  tags  represent  the  temperature  of  superheated  steam  as  it  exits  the  secondary 
superheater  header  and  heads  to  the  high  pressure  turbine  throttle  valves. 
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2.5.1.2.14  Attemperation  spray 

These  represent  the  amount  of  cool  feedwater  that  is  sprayed  into  main  steam  between 

the  primary  and  secondary  superheaters  to  control  the  temperature  of  the  steam  at  the  sec- 
ondary superheater  outlet  to  the  turbine. 

U28300  represents  fine  control.  This  valve  responds  automatically  and  in  analog  fash- 
ion to  all  changes  in  steam  temperature  at  the  secondary  superheater  outlet.  U28301  repre- 
sents bulk  control.  It  responds  only  to  changes  in  SSH  outlet  temp  that  are  exceed  preset 
deadband.  These  coarse  and  fine  cooling  controls  are  combined  to  dampen  and  control 
steam  outlet  temp  against  oscillations  or  imbalances  inherent  in  the  steam  system. 

All  desuperheating  sprays  receive  their  volume  of  feedwater  from  total  feedwater 
flow. 

2.5.1.2.15  Reheat  steam 

Exhaust  from  the  high  pressure  turbine  stage  is  cycled  back  to  the  furnace  via  the 

reheat  steam  loop  where  it  is  sprayed  then  re-introduced  to  heat  exchange  in  the  reheat 
superheater.  Through  the  reheat  superheater  this  steam  is  brought  back  up  to  lOOOdegF 
and  580pisa  upon  which  it  is  sent  to  the  intermediate  stage  of  the  turbine.  Exhaust  fi-om  the 
intermediate  stage  turbine  flows  to  the  low  pressure  turbine  stage. 

2.5.1.2.16  Temperature 

This  temperature  represent  the  temperature  of  re  superheated  steam  as  it  heads  to  the 
intermediate  turbine  stage  inlet. 

2.5.1.2.17  Attemperation  sprays 

These  sprays  function  like  the  superheater  sprays.  They  inject  relatively  cool  feedwa- 
ter into  the  reheat  steam  after  it  has  been  extracted  fi-om  the  turbine  and  before  it  is 


58 

reheated.  They  function  to  control  the  temperature  of  the  steam  at  the  outlet  of  the  reheat 
superheater.  Unlike  the  superheat  desuperheaters,  these  sprays  do  not  have  separate  fine 
and  coarse  control  functions. 

2.5.1.2.18  Furnace  metal  temps 

These  tags  represent  an  array  of  thermocouples  installed  on  the  vertical  legs  of  the 

pendant  superheaters.  Especially  in  gas  burning  the  fire  side  material  temperature  of  these 
heat  exchangers  can  become  problematic.  Unit  2  has  an  especially  large  area  of  super- 
heater, which  is  the  heat  exchange  closest  to  the  fire  itself  Because  gas  bums  at  a  cooler 
temperature  than  oil  less  radiant  heat  is  absorbed  by  the  waterwalls  of  the  fiimace  and  for 
the  same  output  of  steam  more  heat  must  be  passed  to  the  steam  loop  through  the  gas 
stream  and  the  superheaters.  This  superheater  weighted  heat  transfer  zone  in  gas  burning, 
combined  with  air  flow  stratification  that  seems  to  be  inherent  to  this  unit,  make  careful 
monitoring  of  these  thermocouples  necessary.  Extended  temps  above  1 1 00  degrees  can 
increase  material  fatigue  signifigantly. 

2.5.1.2.19  Secondary  superheater  metal  temps  top-bottom  L-R 

These  represent  the  temperature  of  the  firesides  of  selected  evenly  spaced  legs  of  the 

secondary  superheater,  which  encounters  hot  gas  second,  after  the  primary  superheater. 
They  are  alphabetized  horizontally  across  the  superheater  surface  with  upper  representing 
the  trailing  side  and  lower  representing  the  leading  side. 
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2.5.1.2.20  Primary  superheater  metal  temps  top-bottom  L-R 

These  represent  the  temperature  of  the  firesides  of  selected  evenly  spaced  legs  of  the 

primary  superheater,  which  encounters  hot  gas  first  and  is  closest  to  the  flame  front.  They 
are  alphabetized  horizontally  across  the  superheater  surface. 

2.5.1.2.21  Reheat  superheater  metal  temps  top-bottom  L-R 

These  represent  the  temperature  of  the  firesides  of  selected  evenly  spaced  legs  of  the 

reheat  superheater,  which  encounters  hot  gas  third,  after  the  secondary  superheater  and 
before  the  feedwater  economizer.  They  are  alphabetized  horizontally  across  the  super- 
heater surface  with  upper  representing  the  trailing  side  and  lower  representing  the  leading 
side. 

2.5.1.2.22  Exhaust 

The  gasses  created  by  combustion  flow  upward  through  the  furnace  gas  path  across 
the  primary  and  secondary  superheaters,  the  reheat  superheater,  and  a  superheater-  like 
feedwater  preheater  called  an  economizer.  In  this  pass  all  steam  loop  heat  transfer  occurs. 
After  leaving  the  furnace  these  exhaust  gasses  flow  into  a  ljungstrom  air  heat  exchanger 
where  heat  is  traded  to  the  incoming  combustion  air.  Under  the  pull  of  the  induced  draft 
fans  this  now  350  degree  gas  passes  through  the  units  robust  electrostatic  precipitator 
array,  through  the  induced  draft  fans  themselves  and  then  up  the  stack. 

The  makeup  of  the  fluegas  at  the  point  it  leaves  the  furnace  represents  the  overall  qual- 
ity of  combustion.  Key  parameters  include  how  much  oxygen  has  been  left  by  the  com- 
bustion process,  and  how  much  CO  has  been  created.  The  richness  or  leanness  of 
combustion  is  directly  evident. 
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2.5.1.2.22.1  Flue  gas 

2.5.1.2.22.1.1  CO 

These  tags  represent  the  CO  contained  in  exhaust  gasses  as  measured  in  the  side  A 
(U27814)  and  side  B  (U27813)  furnace  outlets  to  the  exhaust  ducts  just  after  the  econo- 
mizer. 

These  are  point  measures  of  CO  in  a  very  large  duct  and  may  not  capture  exact  CO 
content.  They  also  display  extreme  side  to  side  bias  with  side  B  showing  higher  CO  con- 
tent. Although  peculiar,  this  side  to  side  bias  is  believed  to  be  a  real  feature  of  the  Canal 
Unit  2  furnace  draft.  These  tags  are  directly  related  to  the  quality  of  combustion  and  can 
serve  as  a  non  delayed  approximation  of  CO  as  it  will  be  seen  at  the  stack. 

2.5.1.2.22.1.2  02 

These  tags  represent  the  02  contained  in  exhaust  gasses  as  measured  in  the  side  A  and 
side  B  furnace  outlets  to  the  exhaust  ducts  just  after  the  economizer.  These  tags  are  used  in 
modeling  to  represent  the  richness  or  leanness  of  combustion.  They  are  impacted  by  and 
can  be  used  as  a  control  reference  for  forced  draft  fan  output  trim  on  air  demand.  In  addi- 
tion these  tags  are  used  by  Canal  as  a  part  of  the  CEM  NOx  calculation. 

2.5.1.2.22.1.3  Temps 

The  temperature  of  the  air  being  forced  through  the  boiler  at  Canal  Unit  2  impacts  and 

represents  many  process  parameters,  from  combustion  quality,  to  heat  transfer  distribu- 
tion, to  induced  draft  fan  output.  It  also  is  control  reference  for  the  temperature  and  veloc- 
ity of  exhaust  leaving  the  stack. 
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2.5.1.2.22.2  Stack 

The  CEM  (Continuous  Emissions  Monitoring  Unit)  consists  of  an  array  of  extraction 
gas  analyzers  in  a  computer  room  at  the  base  of  Canal's  500  ft.  stack.  The  pitots  of  these 
analyzers  sniff  mixed  exhaust  from  the  top  of  the  18  foot  wide  Unit  2  flue.  The  specific 
amounts  of  certain  compounds  measured  in  this  gas  are  entered  into  a  database.  This  data- 
base serves  as  a  binding  legal  history  of  Canals  environmental  compliance.  Each  violation 
of  emissions  limits  placed  on  certain  compounds  like  NOx  and  CO  is  recorded.  If  the  unit 
is  in  danger  of  breaking  its  allowed  daily  average  output  of  these  regulated  pollutants, 
measured  from  midnight  to  midnight,  all  steps  must  be  taken  to  regain  compliance,  includ- 
ing dropping  load.  The  cost  of  such  a  sacrifice  is  immense  and  in  effect  these  hourly  and 
daily  emissions  limits  have  become  control  variables  of  primary  importance. 

2.5.1.2.22.2.1  CO 

This  tag  represents  the  CO  content  of  stack  gas  in  parts  per  million. 

It  is  worth  noting  that  CO  and  NOx  represent  conflicting  states  of  combustion  as  they 
are  currently  understood  and  managed.  To  reduce  NOx  production,  combustion  is  kept 
cool  and  rich.  NOx  formation  has  been  shown  to  positively  relate  to  increased  exposure  to 
combustion  and  increased  temperature.  Over  fire  air  is  used  to  complete  this  off  -  stoichi- 
ometric combustion.  Unfortunately  such  rich  and  cool  (incomplete)  combustion  inherently 
produces  increased  CO. 

2.5.1.2.22.2.2  NO^ 

This  tag  is  calculated  using  a  regulatory  approved  method  and  is  used  to  represents  the 
pounds  of  NOx  produced  by  Canal  Unit  2  per  million  BTUs. 
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2.5.1.2.22.2.3  Temp 

Stack  temp  is  important  to  Canal  for  several  reasons.  Keeping  stack  temperature  at  a 
certain  point  guarantees  that  no  condensation  of  sulphur  products  can  occur  in  the  exhaust 
ducts,  precipitators,  or  in  the  stack  itself  The  products  of  sulphur  condensation  are  acidic 
and  over  extended  periods  of  time  can  be  damaging  to  expensive  capital  equipment.  As 
long  as  sulphur  emissions  are  within  limits,  and  they  are  not  a  problem  at  Canal,  since 
Canal  uses  low  sulphur  fuel  oil,  it  is  beneficial  to  push  them  all  the  way  out  of  the  stack 
before  they  can  condense.  This  requires  sufficient  stack  gas  temperatures  and  velocities. 

Stack  Temperature  is  controlled  primarily  by  the  amount  of  preheating  that  is  done  to 
the  air  before  it  even  enters  the  windbox.  Because  of  the  relationship  of  final  stack  gas 
temperature  to  combustion  air  temperature,  and  the  relationship  of  combustion  air  temper- 
ature to  other  properties  of  combustion,  stack  temperature  can  be  an  important  and  tricky 
control  point.  Since  the  heat  losses  to  the  exhaust  through  the  exhaust  ducts,  precipitators 
and  induced  draft  fans  are  fixed,  fluegas  temperature  also  represents  stack  temperature  but 
without  gas  path  travel  delay. 


CHAPTER  3 
BOILER  OPTIMIZATION 

The  most  efficient  method  for  reducing  NOx  emissions  is  clearly  during  the  combus- 
tion process  [39].  As  presented  in  Section  2.4  "NOx,"  simply  changing  the  combustion 
temperature  and  fuel/air  distribution  can  dramatically  affect  NOx  emissions.  The  combus- 
tion of  fossil  fuels  inside  a  large-scale  boiler,  however,  is  a  highly  complex  process;  this 
complexity  is  a  direct  function  of  the  boiler  size.  A  typical  electric  power  boiler  maintains 
a  "fireball"  which  is  3  to  5  stories  tall,  and  there  are  hundreds  of  parameters  which  affect 
the  injecdon  of  fiiel  and  air  at  different  locations  within  the  furnace. 

The  problem  is  our  lack  of  understanding  about  how  these  combustion  parameters 
affect  NOx  formation.  This  multivariate  optimization  problem  requires  a  technology  that 
can  look  at  the  process  globally  and  determine  the  appropriate  combination  of  combustion 
controls. 

3.1  First  Principles 

The  concepts  behind  boiler  optimization  are  relafively  simple: 

•  If  the  boiler  operates  in  an  oxygen-rich  envirorunent,  i.e.,  with  uimecessary 
excess  air,  boiler  efficiency  will  decrease  due  to  the  loss  of  sensible  heat  up  the 
stack;  NOx  emissions  will  increase  concurrently. 

•  If  the  boiler  operates  in  a  fuel-rich  environment,  i.e.,  with  insufficient  air,  boiler 
efficiency  will  decrease  due  to  the  loss  of  unbumed  fiiel.  In  addition,  insufficient 
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air  leads  to  CO  formation  which  causes  slagging  and  water  wall  corrosion,  ulti- 
mately shortening  boiler  life. 

Between  these  two  airflow  conditions  there  is  a  zone  of  optimum  combustion.  This  is 
shown  as  a  dark  gray  band  in  Figure  5. 

3.2  Fuel  and  Air  Distribution 

Boilers  for  electric  power  and  industrial  steam  typically  have  poor  distribution  of  fuel 

and  air  within  the  furnace.  This  causes  some  regions  of  the  firebox  to  be  fuel-rich  and 
other  regions  to  be  oxygen-rich.  This  situation  is  clearly  undesirable  as  it  leads  not  only  to 
unnecessary  NOx  production  and  reduced  efficiency,  but  reduced  boiler  life  expectancy 
due  to  water  wall  corrosion  and  slagging. 

The  variability  of  the  fuel-air  ratios  at  different  locations  throughout  the  boiler  is  rep- 
resented as  a  light  gray  band  in  Figure  5.  This  variability  determines  the  amount  of  aggre- 
gate air  required  to  ensure  that  all  regions  inside  the  boiler  avoid  fuel-rich  combustion. 
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Figure  5:  Combustion  emissions  characteristic  versus  air  flow. 
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By  improving  the  distribution  of  fuel  and  air  in  all  parts  of  the  firebox,  it  is  possible  to 
reduce  the  aggregate  airflow  while  maintaining  the  same  safety  margin.  This  improvement 
is  illustrated  in  Figure  6.  The  narrower  darkly-shaded  band  which  represents  the  improved 
distribution  of  air  and  fuel  moves  closer  to  the  zone  of  optimum  combustion.  Reducing  the 
aggregate  airflow  simultaneously  increases  boiler  efficiency  and  reduces  NOx  emissions. 
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Figure  6:  Effect  of  lower  02  on  combustion  emissions. 

The  key  challenge  in  boiler  optimization  is  identifying  which  of  the  many  controls 
affect  performance  and  how  they  need  to  be  manipulated  to  ensure  optimal  performance 
as  process  and  economic  conditions  change. 

3.3  Boiler  Tuning 

Boiler  manufacturers  and  service  companies  offer  boiler-tuning  methodologies  that 
use  the  above  principles  of  combustion  to  identify  a  limited  set  of  control  settings  which 
help  lower  NOx  and  increase  efficiency  without  the  need  for  substantial  capital  expendi- 
ture. Such  boiler  tuning  improves  unit  performance  but  does  not  begin  to  generate  the  sav- 
ings achievable  through  improved  control. 
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Unfortunately,  the  number  of  control  variables  available  to  optimize  performance  is 
too  large  for  offline  boiler  tuning  to  predict  the  optimum  settings.  Optimum  settings  vary 
with  load,  fuel  quality,  boiler  conditions,  weather,  and  other  factors  making  offline  tuning 
difficult  if  not  impossible. 

3.4  The  Role  of  CO 

Figure  5  and  6  both  show  an  exponential  rise  in  CO  as  excess  air  is  reduced  and  the 

boiler  approaches  peak  efficiency.  The  steepness  of  the  CO  curve  depends  upon  the 
degree  of  mixing  of  fijel  and  air  within  the  furnace.  Poor  mixing  broadens  the  CO  curve  by 
creating  pockets  of  fuel-rich  and  oxygen-rich  combusfion.  Together  with  02,  CO  levels 
provide  the  best  indication  about  combustion  quality. 

A  model  for  CO  will  provide  valuable  information  about: 

•  how  well  mixed  the  fuel  and  air  are  in  the  furnace, 

•  how  individual  setpoints  can  be  used  to  improve  this  mixing,  and 

•  conditions  which  lead  to  slagging  and  water-wall  corrosion. 

The  CO  measurement  serves  as  a  key  safety  constraint  when  optimizing  the  boiler.  By 
controlling  to  CO  levels,  the  boiler  can  be  optimized  without  compromising  safety  mar- 
gins. Improved  air  and  fuel  distribution  will  merely  tighten  the  CO  curve,  resulting  in 
improved  efficiency  and  lower  NOx. 


CHAPTER  4 
CONTROL  DESIGNS 

This  research  investigates  the  apphcabiUty  of  neurocontrol  techniques  to  complex  pro- 
cess control  problems,  and  develops  a  methodology  for  implementing  them.  Towards  this 
end,  this  work  will  develop  several  detailed  neural  network-based  control  designs  and 
apply  them  to  the  reduction  of  NOx  and  the  maintenance  of  acceptable  CO  levels  in  elec- 
tric power  plants.  Subsequent  sections  implementation  these  control  designs  and  use  our 
NOx  case  study  to  compare  and  contrast  them.  The  control  methodology  will  be  presented 
as  follows: 

1 )  A  methodology  for  categorizing  key  process  variables  into  groups  that 
are  required  for  all  control  designs. 

2)  A  methodology  for  formally  stating  the  control  optimization  objectives 
and  operating  constraints  using  the  aforementioned  variable  definitions. 

3)  Performance  criteria  by  which  the  various  control  designs  will  be 
judged,  based  on  these  formal  objectives  and  constraints. 

4)  Four  formal  control  designs  with  explicitly  account  for  state  variable 
dependencies. 

4.1  Variable  Definitions 

When  designing  a  controller  for  large-scale  industrial  processes,  there  are  a  large  num- 
ber of  variables  to  be  considered.  The  physical  processes  are  typically  considered  to  have 
inputs,  disturbances,  states  and  outputs.  The  following  variable  definitions  are  proposed 
as  a  methodology  for  categorizing  all  process  variables  into  subsets;  these  subsets  will 
prove  useful  when  designing  controllers  in  general: 
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1)  Manipulated  Variables  (MVs):  process  inputs  which  have  been  selected 
for  our  controller  to  manipulate.  The  MVs  should  be  independent  of  one 
another,  i.e.,  manipulating  one  will  not  cause  a  change  in  any  of  the  oth- 
ers. 

2)  Disturbance  Variables  (DVs):  process  inputs  or  disturbances  that  affect 
the  state  or  output  of  the  process,  but  we  either  cannot  or  have  chosen 
not  to  manipulate.  The  DV  should  be  independent  of  both  each  other 
and  the  MVs. 

3)  Control  Variables  (CVs):  the  process  state  or  output  variables  that  the 
controller  will  be  designed  to  control.  The  CVs  should  be  a  function  of 
the  MVs  and  DVs  or  there  is  little  hope  of  the  controller  being  able  to 
control  them. 

4)  State  Variables  (SVs):  process  state  variables,  which  are  a  function  of 
the  MVs  and/or  DVs,  that  affect  the  CVs.  Alternatively,  the  SVs  may  be 
process  output  variables  that  have  not  been  selected  for  control  but  need 
to  be  considered  as  constraints. 

Notice  that  the  MV,  DV,  SV  and  CV  definitions  categorize  the  process  logically  and 
not  physically.  These  definitions  divide  variables  based  on  how  the  controller  will  be  con- 
figured, rather  than  how  the  physical  process  is  configured.  The  MVs  will  always  be  pro- 
cess inputs,  i.e.,  can  be  manipulated  by  operators,  but  the  DVs  can  contain  both  process 
inputs  and  disturbances  depending  on  which  inputs  are  being  manipulated.  Likewise,  SVs 
and  CVs  can  each  consist  of  any  combination  of  process  states  and/or  outputs,  based  on 
which  will  ultimately  be  controlled. 

Notation:  The  categorization  of  variables  into  CVs,  SVs,  DVs  and  MVs 
will  be  used  extensively  throughout  this  work,  and  is  conceptually  consis- 
tent with  the  literature  on  optimization  and  control  [37]. 


4.2  Optimization  Objectives 

The  control  objective  is  to  lower  NOx.  Formally  this  objective  needs  to  be  stated  as  an 

objective  function  for  optimization.  Since  several  of  the  controllers  developed  here  are  tra- 
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jectory  (multi-stage)  controllers,  this  objective  function  will  be  a  function  of  time.  Con- 
sider the  single  control  variable  NOx(tQ)  e  9^  as  the  measured  value  of  NOx  at  time  Iq  . 

An  optimal  control  objective  with  fixed  terminal  time  T  for  minimizing  NOx(t)  over  the 
interval  t  e  {tQ,tQ+  T]  can  be  given  by 

J  =  i  ^  NOx(t) .  (54) 

In  general,  there  will  be  more  than  one  CV.  If  all  CVs  are  to  have  equal  impact  on  this 
objective  function,  then  two  effects  will  have  to  be  removed  from  the  optimization  objec- 
tive: 1)  the  effect  of  power  differences  between  these  CVs,  and  2)  the  current  value  of 
each  CV.  The  following  objective  function  extends  (54)  to  muUiple  control  variables 

to+T  N" 

=  7^  Z  I  P,(^(cv,(0)  -  2(cv,(?o))) ,  (55) 

where  A^^  is  the  number  of  CVs,  and  p  •  e  9^  is  a  priority  weighting  factor  and 

Z{x)  =  (jc  -  ^;<.)/ct^  is  the  z-score  statistic  [66].  Assuming  that  our  controller  is  designed 

to  minimize  /;  for  p.  >  0  the  CV  cv-  will  be  minimized  over  the  trajectory,  while  setting 

p .  <  0  will  maximize  the  output. 

Equation  (55)  considers  the  case  where  CVs  are  to  be  maximized  or  minimized.  In 
general,  the  goal  is  to  design  a  controller  capable  of  maintaining  a  control  setpoint.  A  gen- 
eralized optimization  objective  is  therefore  presented  as 

•^=^IEpA>  (56) 

t  =  toi=  \ 
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where  D  ■  is  a  desirability  function  that  can  be  tailored  for  each  CV  to 


Z);'"^  =  Z(cv.(t))-Z(cv.(t^)),  (57) 

D"""  =  Z(cv.(?o))  - Z(cv.(0) ,  or  (58) 

D'.'  =  JiZicv,it))-Zispm\  (59) 

4.3  Operating  Constraints 


Constraints  will  be  used  to  ensure  that  the  optimizer  produces  a  feasible  solution.  By 
feasible  we  mean:  1)  the  MV  moves  can  be  made,  and  2)  that  when  these  MVs  are  applied 
the  plant  will  end  up  in  a  desirable  state.  Feasible  solutions  will  be  guaranteed  by  design- 
ing controllers  which  are  able  to  maintain  MV  and  SV  constraints. 

4.3.1  Manipulated  Variable  Constraints 

To  ensure  that  the  MV  moves  can  be  made,  the  controllers  will  maintain  simple  range 

constraints.  A  range  constraint  consists  of  the  upper  and  lower  limits  that  an  MV  will  be 
allowed  to  move.  Formally  the  range  constraint  for  MV  mv-  will  be  given  by 

^  ^^in^  ^rnax^  ^  ^^^^ 

where  Cf is  the  MVs  absolute  minimum  and  C^"^  is  its  maximum.  Controllers  will  be 

required  to  provide  an  optimal  MV  trajectory  {mv  (t)}t  =  t^      such  that 

mv*(t)eC".'"        \fi,t.  (61) 
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4.3.2  State  Variable  Constraints 

Similarly,  to  ensure  that  controllers  drive  the  plant  to  a  desirable  state,  SV  constraints 

will  also  have  to  be  addressed.  Formally,  controllers  will  be  required  to  provide  optimal 
MV  trajectories  that  result  in  SV  trajectories  {sv  (t)}i  =  i^+\  such  that 

sv*(t)  e  (f."        \fi,t.  (62) 

4.3.3  Penalty  Functions 

Each  control  design  considered  will  employ  an  optimization  algorithm  during  some 

phase  of  its  development.  Some  optimization  algorithms  are  able  to  deal  with  constraints 
directly,  i.e.,  given  knowledge  of  the  constraints  they  can  ensure  a  feasible  solutions.  Oth- 
ers, however,  will  have  to  treat  constraints  indirectly  by  addressing  them  with  the  objec- 
tive function.  The  most  common  method  for  addressing  operating  constraints  in  an 
objective  function  are  through  the  use  of  penalty  functions  [54].  For  example,  SV  con- 
straints can  be  stated  as  penalty  functions  of  the  form 


^lin     ■  (63) 


(sv- -C    )  sv.<C 
0  else 


Generalizing  the  penalty  function  to  multiple  SV  constraints,  differences  in  the  energy 
of  the  respective  signals  will  once  again  have  to  be  normalized  out.  These  effects  can  be 
compensated  for  using  a  generalized  penalty  function  of  the  form 


Hsv.,  C^") 


iZisv,)-Z{C"""))  sv,<C^ 
0  else 


2  .  (64) 
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Given  a  set  of      SV  constraints,  an  optimizer  may  satisfy  these  constraints  by 
appending  their  respective  penalty  functions  to  its  criterion 

/  =  j+  j;p;Vv,,0'  (65) 

/■ = 1 

where  p^^  allows  constraints  to  be  individually  prioritized.  Similarly,  both  MV  con- 
straints can  be  appended  to  the  optimizer's  criterion  by  defining  the  penalty  functions 

Note  that  implementing  constraints  with  penalty  functions  will  not  guarantee  that  the 
constraints  are  met  precisely.  If  the  constraints  are  properly  prioritized  relative  to  the  opti- 
mization objectives,  however,  these  constraints  are  easily  maintained  within  the  desired 
level  of  accuracy. 

4.4  Performance  Criteria 

For  the  case  study,  controllers  will  be  judged  based  on  their  ability  to  lower  NOx  while 

maintaining  desired  CO  emissions.  To  this  end,  subsequent  sections  will  measure  the  per- 
formance of  controllers  as  a  plant  operator  moves  MVs  according  to  their  control  laws. 
Comparing  controller  performance,  however,  will  prove  a  difficult  task,  since  the  operator 
can  only  take  the  advice  from  one  controller  at  a  time  and  the  plant  is  constantly  changing 
state.  Although  the  controllers  may  be  able  to  deal  with  non-steady-state  conditions,  it  will 
be  nearly  impossible  to  separate  the  process  responses  to  the  state  changes  versus  the  con- 
trol action. 
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Further  complicating  matters,  while  one-time  tests  will  provide  useful  results  with 
which  to  judge  the  controllers,  they  are  not  the  only  criteria.  The  controllers  studied  will 
be  judged  by  the  following  criteria: 

1)  Ability  to  control  NOx  and  CO. 

2)  Ability  of  the  operators  to  perform  the  recommended  MV  moves 

3)  Flexibility  with  respect  to  changing  performance  objectives  and  operat- 
ing constraints 

4)  Ability  to  deal  with  changing  operating  states,  e.g.  load  changes 

4.5  Controller  Designs 

Four  controller  designs  will  be  developed.  The  controller  designs  considered,  fall  into 

the  broad  categories  of: 

1 )  Model-Predictive  Control 

2)  Model-Inverse  Control 

3)  Model-Based  Direct  Control 

There  are,  however,  no  standard  recipes  for  building  these  controllers.  The  field  is  still 
immature,  and  neurocontrol  designs  presented  in  the  literature  tend  to  be  ad  hoc.  This 
work  seeks  to  not  only  develop  and  test  four  neurocontrol  designs,  but  also  to  develop  a 
generalized  methodology  for  implementing  control  designs  belonging  to  the  above 
abstract  categories.  Each  controller  must  be  able  to  deal  with  the  MV  and  SV  constraints, 
and  will  be  judged  by  the  performance  criteria  described  above. 

For  each  of  the  control  designs  considered,  there  are  two  distinct  phases  in  the  imple- 
mentation: 

1)  Offline  training. 

2)  Online  control. 


4.5.1  Steady-State  Optimizer 

The  simplest,  and  most  prevalent,  neurocontroUer  in  the  literature  is  the  steady-state 

optimizer  [43][40][31].  This  controller  belongs  to  the  model-predictive  control  family. 
Model-predictive  control  (MPC)  is  not  new  to  commercial  applications  in  the  process  con- 
trol industry.  The  advance  proposed  here  is  the  application  of  neural  network  reference 
models  within  this  controls  methodology. 

The  concept  of  MPC  is  straight  forward:  combine  a  model  for  the  process  with  an  opti- 
mizer to  obtain  real-time  optimal  setpoints.  Model  predictive  controllers  can  be  steady- 
state  or  dynamic,  depending  on  characteristics  of  their  underlying  process  models.  This 
section  details  the  design  of  a  neural  network-based  steady-state  MPC  controller  to  meet 
the  problem  specifications  presented  in  Sections  4.2,  4.3  and  4.4. 

4.5.1.1  Offline  training 

Training  a  MPC  controller  follows  the  schematic  outlined  in  Figure  7.  Notice  that 

there  are  actually  two  reference  models  being  trained:  one  SV  model  and  one  CV  model. 
The  details  for  how  to  train  these  models  will  be  covered  in  Chapter  6.  Notice,  however, 

that  the  criterion  /"  is  a  model  training  criterion  to  be  presented  in  Chapter  6,  and  not  the 

control  performance  objective  /  presented  in  Section  4.4  "Performance  Criteria." 
The  model  definitions  required  by  the  steady-state  optimizer  are: 

1)  Steady-State  SV  Model:  ^  =  ssSVModel(fiiv,dv) 

2)  Steady-State  CV  Model:  ?t  =  ssCVModel(fm>,I^,^) 
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Figure  7:  Offline  training  and  retuning  configuration  for  steady-state  optimizer. 

The  reason  to  have  a  CV  model  is  obvious,  it  will  provide  the  reference  model  that  the 
optimizer  uses  to  figure  out  its  optimal  MV  setpoints.  The  motivation  for  having  a  SV 
model,  however,  is  somewhat  less  apparent.  The  problem  is  that  changes  made  to  the  MVs 
by  the  optimizer  will  not  only  change  the  CVs,  but  also  the  SVs.  The  optimizer  will  have 
to  consider  the  effect  that  MVs  will  have  on  the  SVs,  if  it  is  to  accurately  predict  their 
effect  on  CVs.  Note  that  the  CV  model  has  an  input  space  that  consist  of  MVs,  DVs  and 
SVs. 

4.5.1.2  Online  control 

The  online  control  configuration  is  illustrated  in  Figure  8.  Here  an  optimizer  calculates 

ArgMin_.^^{f}  using  the  SV  and  CV  reference  models  developed  during  model  train- 
ing. The  optimizer  starts  with  the  current  value  of  the  MVs  mv*  =  mv ,  uses  the  SV 
model  to  estimate  the  current  SVs  sv* ,  which  are  then  used,  along  with  the  current  value 
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of  the  DVs  dv  ,  to  estimate  the  current  CVs  cv  .  The  optimizer  then  iteratively  updates 
its  estimate  for  the  optimal  MVs  mv*  to  minimize  its  objective  function  . 


{dv,  sv} 
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Figure  8:  Onhne  control  configuration  for  steady-state  optimizer. 

Both  direct  and  descent-based  optimization  can  be  used  for  MPC.  If  the  number  of 
MVs  is  small,  then  direct  optimizer  provides  an  efficient  alternative.  As  the  number  of 
MVs  grows,  however,  direct  optimization  quickly  becomes  impractical.  Descent-based 
optimization  is  possible  because  the  SV  and  CV  models  are  capable,  via  backpropagation, 

of  calculating  the  gradient  of  f  with  respect  to  their  inputs,  i.e.,  their  inputs  sensitivities 
given  the  sensitivities  at  their  outputs.  In  this  manner,  the  optimizer  calculates  the  CV  sen- 
sitivities df/dcv* ,  from  which  the  CV  Model  is  able  to  calculate  SV  sensitivities 
df/dsv*  and  partial  MV  sensitivities  df/dmv* ,  from  which  the  SV  model  calculates 

the  remaining  partial  MV  sensitivities  dfldmv* ,  and  finally  the  optimizer  is  able  to 
update  its  optimal  MV  estimate  using  the  MV  gradient  of 


dmv       dmv  dmv 


(66) 
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This  is  really  just  the  backpropagation  of  backpropagations,  a.k.a.  more  flin  with  the 
chain  rule. 

Several  optimization  methods  were  tested  for  the  optimizer,  along  with  various  tech- 
niques for  dealing  with  the  constraints.  The  most  effective  combination  identified  was  to 
use  the  unconstrained  conjugate  gradients  method  in  combination  with  an  objective  func- 
tion which  included  the  SV  and  MV  constraint  penalty  functions 

where  the  MV,  SV,  DV  and  CV  variable  sets,  along  with  their  corresponding  constraints, 
are  defined  in  Section  6.6.3  "Final  Variable  Sets,"  and  all  priorities  have  been  set  to  1 .  The 
details  of  the  conjugate  gradients  method  will  be  presented  in  Section  6.5  "Learning  Algo- 
rithm." 

Penalty  function  can  negatively  impact  the  performance  of  a  descent-based  optimizer 
by  adding  complexity  to  the  performance  surface  having  little  to  do  with  the  underlying 
problem.  This  is  particularly  true  when  the  constrained  variables  lie  outside  of  their  con- 
strained values.  For  the  SV  constraints,  there  is  no  choice  but  to  use  penalty  functions  for 
constraints.  For  MVs,  however,  there  are  alternatives,  because  the  MVs  always  start  at 
their  current  values  which  are  always  within  the  constraints.  Hence,  there  is  little  to  no 
overhead  to  using  MV  constraints  for  our  online  optimizer.  It  terms  of  the  performance 
surface,  the  constraints  can  be  thought  of  as  placing  a  guardrail  on  both  sides  of  our  cur- 
rent position  in  weight-space  along  our  path,  while  having  little  impact  on  the  local  topog- 
raphy of  the  road. 
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4.5.2  Steady-State  Model-Inverse  Controller 

The  next  controller  design  belongs  to  the  model-inverse  control  (MIC)  family.  Con- 
ceptually, model-inverse  control  is  straightforward:  train  a  model  to  predict  the  MVs  from 
the  current  and  known  DVs,  SVs  and  CVs,  then,  given  a  desired  CV  setpoint,  this  model 
can  be  used  directly  to  obtain  the  required  MVs.  Implementing  a  MIC  controller  is  also 
straightforward  and  can  work  reasonably  well,  given  that  the  relationship  between  MVs 
and  CVs  is  in  fact  invertible.  This  sections  details  the  design  of  a  neural  network-based 
MIC  controller,  designed  to  meet  the  problem  specifications  presented  in  Sections  4.2, 4.3 
and  4.4. 

4.5.2.1  Offline  training 

Training  a  MIC  controller  follows  the  schematic  outlined  in  Figure  9.  Once  again, 

notice  that  the  MV  model  is  being  implemented  by  separate  inverse-SV  (ISV)  and 
inverse-MV  (IMV)  models.  Once  again,  the  details  for  how  to  train  these  models  will  be 
covered  in  Chapter  6.  The  model  definitions  required  by  the  steady-state  model-inverse 
controller  are: 

1)  Steady-State  ISV  Model:  sv  =  ssISVModel(cv,dv) 

2)  Steady-State  IMV  Model:       =  ssIMVModel{cv,  sv,  dv) 
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Figure  9:  Offline  training  and  retuning  configuration  for  model-inverse  controller. 

Analogous  to  our  MPC  controller,  two  models  have  been  developed  which  when  com- 
bined can  invert  the  process.  The  reason  to  have  a  IMV  model  is  obvious,  it  provides  the 
inverse-model  that  the  controller  uses  to  figure  out  optimal  MV  setpoints.  The  problem  is 
that  not  all  CV-SV  combinations  are  feasible.  Given  a  specified  CV  target,  the  ISV  model 
estimates  the  corresponding  SVs  which  are  presented  to  the  IMV  model. 

4.5.2.2  Online  control 

The  online  control  configuration  is  illustrated  in  Figure  10.  If  a  known  target  existed 

for  the  CVs,  the  online  control  implementation  would  actually  be  quite  trivial.  One  com- 
plexity is  that  the  exact  value  for  the  lowest  achievable  NOx  from  the  controller  for  a 
given  set  of  conditions  is  not  known.  Another  complication  with  MIC  is  how  to  deal  with 
constraints.  If  one  applies  a  target  CV  to  the  input  of  the  inverse-model,  it  will  predict  a  set 
of  inputs  which  it  believes  would  have  achieved  this  target.  The  problem  is  that  the  model 
does  not  understand  the  MV  or  SV  constraints,  and  if  one  of  the  inputs  it  predicts  falls  out- 
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side  these  constraints  the  controller  can  not  provide  the  required  setpoints.  This  is  analo- 
gous to  the  problem  faced  with  SV  or  CV  constraints  for  MPC.  The  implementation 
outlined  in  Figure  10,  uses  an  optimizer  in  order  to  overcome  both  of  these  issues.  Clearly, 
CV  constraints  are  straightforward. 
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Figure  10:  Online  control  configuration  for  model-inverse  controller. 
The  MIC  controller  uses  an  optimizer  to  calculate  ArgMin_^^{J^}  using  the  ISV  and 

cv 

IMV  reference  models  developed  during  model  training.  The  optimizer  starts  with  the  cur- 

 ^  — ^3|C 

rent  value  of  the  CVs  cv   =  cv,  uses  the  ISV  model  to  estimate  the  current  SVs  sv  , 
which  are  then  used,  along  with  the  current  value  of  the  DVs  dv  ,  by  the  IMV  model  to 

 ^sje 

estimate  the  current  MVs  mv  .  The  optimizer  then  iteratively  updates  its  estimate  for  the 

optimal  MVs  mv  ,  to  minimize  its  objective  function  J  . 

Once  again  both  direct  and  descent-based  optimization  can  be  used  for  MIC,  and  once 
again  a  conjugate  gradients-based  optimizer  was  selected.  Descent-based  optimization  is 
possible  because  the  ISV  and  IMV  models  are  capable,  via  backpropagation,  of  calculat- 


ing the  gradient  of  /  with  respect  to  their  inputs,  i.e.,  their  inputs  sensitivities  given  the 
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sensitivities  at  their  outputs.  In  this  manner,  the  optimizer  calculates  the  MV  sensitivities 

df/dmv* ,  from  which  the  IMV  Model  is  able  to  calculate  SV  sensitivities  dfldsv*  and 

partial  CV  sensitivities  df"^/dcv* ,  from  which  the  ISV  model  calculates  the  remaining 

partial  CV  sensitivities  dJ^^/dcv*  ,  and  finally  the  optimizer  is  able  to  update  it  optimal 
CV  estimate  using  the  CV  gradient  of 

^  =  ^  +  ^.  (68) 

ocv       ocv  ocv 

The  optimizer's  objective  fianction,  which  includes  the  SV  and  MV  constraint  penalty 
frinctions,  is  the  same  objective  frinction  used  by  our  steady-state  optimizer.  The  only  dif- 
ference is  how  the  sensitivities  flow  through  the  system,  as  outlined  above. 

4.5.3  Dynamic  Model-Predictive  Controller 

The  steady-state  optimizer  considered  above  is  a  member  of  the  MFC  family.  The  vast 

majority  of  MFC  applications  use  models  which  are  first-principles  based  [37].  Since  it  is 
not  possible  to  build  an  accurate  first-principles  model  of  NOx,  a  new  steady-state  opti- 
mizer for  MFC  using  neural  network  models  was  developed.  The  vast  majority  of  MFC 
applications  are  dynamic,  however.  The  steady-state  optimizer  only  considers  the  effect 
that  MV  changes  will  have  on  the  unit  in  steady-state  conditions. 

This  section  develops  a  dynamic  neural  network-based  MFC  controller.  The  main  dif- 
ferences between  this  controller  and  our  steady-state  optimizer  is  that  it: 

1)  Understands  the  dynamics  of  the  process. 

2)  Frovides  a  trajectory  of  MV  setpoints  designed  to  optimize  the  path  of 
the  unit  into  the  future,  rather  than  a  optimal  steady-state  position.  In 
other  words,  the  controller  not  only  considers  where  your  going  but  how 
you'll  get  there. 
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The  concept  behind  this  controller's  operation  is  identical  to  that  of  the  steady-state 
optimizer:  combine  a  model  for  the  process  with  an  optimizer  to  obtain  real-time  optimal 
setpoints.  The  only  difference  is  that  the  models  are  now  dynamic,  and  the  optimal  set- 
points  become  optimal  setpoint  trajectories. 

This  sections  details  the  design  of  a  neural  network-based  dynamic  MPC  controller  to 
meet  the  problem  specifications  presented  in  Sections  4.2,  4.3  and  4.4. 

4.5.3.1  Offline  training 

Training  a  dynamic  MPC  controller  follows  a  similar  schematic  as  outlined  in  Figure 

7,  with  the  inclusion  of  each  variables  explicit  dependence  on  time  / ,  as  illustrated  in  Fig- 
ure 1 1 .  Here  the  SV  and  CV  models  are  performing  single-stage  prediction  with  respect  to 

the  MVs  and  DVs;  notice  that  the  CV  model  uses  the  current  value  of  the  SVs  sv(t  +  1 ) . 
The  reasons  for  this  configuration  will  become  clear  when  we  consider  the  online  control 
implementation  in  the  next  section. 


Figure  1 1 :  Offline  training  and  retuning  configuration  for  steady-state  optimizer. 
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Refer  to  Chapter  6  for  details  on  training  the  dynamic  SV  and  CV  reference  models 
used  by  the  dynamic  MPC  controller.  For  now  we  simply  state  the  model  definitions 
required  by  the  dynamic  MPC  controller: 

1)  Dynamic  SV  Model:  sv(t+  1)  =  dSVModelimv(t),  dvit)) 

2)  Dynamic  CV  Model: 

cv{t+\)  =  dCVModel(mv{t),dv(t),svit+l)) 

4.5.3.2  Online  control 

The  online  control  configuration  follows  a  similar  configurafion  to  the  steady-state 

optimizer  presented  in  Figure  8.  Here  a  dynamic  optimizer  is  required,  however.  The  opti- 
mizer calculates  ArgMin  .    {-/^(O }  using  the  dynamic  SV  and  CV  models  developed 

mv*(t) 

during  model  training.  The  steady-state  optimizer  used  an  applicafion  of  the  chain  rule  for 
ordered  partial  derivatives,  which  has  been  coined  "backpropagation"  [40].  From  the  per- 
spective of  the  chain  rule,  our  new  optimizer  is  identical  and  only  the  criterion  changes. 
From  the  perspective  of  the  literature,  this  algorithm  has  been  coined  "backpropagation 
through  time"  [40][70]. 

The  optimizer  starts  with  the  current  value  of  the  MVs  mv*{t)  =  mv{t) ;  uses  the  SV 
model  to  estimate  the  resulting  SVs  sv  (t+  \  );  which  are  then  used,  along  with  the  cur- 
rent value  of  the  DVs  dv(t) ,  to  esfimate  the  resulting  CVs  cv*(t  +  1) .  Notice  that  each 
estimate  can  rely  on  both  present  and  past  values  of  the  inputs.  The  optimizer  will  then 
repeat  this  process  over  the  time  interval  t  e  (t^,  tQ  +  T]  to  produce  the  MV,  SV  and  CV 

trajectones  {mv  (t)}i  =  ,^     ,  {5V         =     i  and  {cv  (0 }/  =   +  i '  respectively. 
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The  objective  function  which  included  the  SV  and  MV  constraint  penalty  functions 
can  now  be  calculated 


(69) 


where  the  MV,  SV,  DV  and  CV  variable  sets,  along  with  their  corresponding  constraints, 
are  defined  in  Section  6.6.3  "Final  Variable  Sets,"  and  all  priorities  have  been  set  to  1. 
The  optimizer  then  iteratively  updates  its  estimate  for  the  optimal  MV  trajectories, 

'o+T-\ 

{mv  {t)}t  =  to  ,  to  minimize  its  objective  function  j(t) .  Each  step  in  the  iteration  per- 
forms the  following,  starting  with  t  =  tQ  +  T-  \  and  iterating  down  to  f  =  /q  :  first,  the 

optimizer  calculates  the  CV  sensitivities  df{t)/dcv*(t) ,  from  which  the  CV  Model  is 
able  to  calculate  SV  sensitivities  df(t)/dsv*(t)  and  partial  MV  sensitivities 
df^{t)/dmv*(t  -  1) ,  from  which  the  SV  model  calculates  the  remaining  partial  MV  sen- 
sitivities df^(t)/dmv*(t  -  1) ,  and  finally  the  MV  sensitivity  at  time  t  -  1  can  be  calcu- 
lated as 

dmv*{t-\)      dmv*(t-\)  dmv*(t-\) 
Once  the  backward  pass  is  complete,  the  optimizer  is  now  able  to  update  it's  optimal 

^*  'o+T-\ 

MV  trajectory  estimate  {mv  (0 } /  =  /o     using  the  MV  gradient  trajectory 

f  df(t) 


 ^:(e 

.dmv  (t). 


to+T-\ 

(71) 
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Notice  that  the  sensitivities  at  time  ;  depend  on  the  sensitivities  in  the  future.  This  is 
because  the  models  variables  at  time  t  depend  on  the  variables  in  the  past,  i.e.,  the  models 
are  dynamic.  Hence  the  term  "backpropagation  through  time." 

This  entire  optimization  cycle  is  run  at  each  time  step  t^ .  The  optimizer  derives  the 

next  T-  1  MV  moves,  and  the  first  MV  setpoint  is  applied  to  the  unit  /mv*(^q)  .  At  this 
point  the  entire  process  is  repeated. 

4.5.4  Model-Reference  Adaptive  Controller 

The  final  controller  design  considered  belongs  the  model-reference  adaptive  control 

family  (MRAC).  Like  the  dynamic  MPC  controller,  the  MRAC  controller  understands 
process  dynamics  and  provides  a  trajectory  of  MV  setpoints  which  optimize  both  where 
you  are  going  and  how  you  get  there.  The  fundamental  difference  between  these  two  con- 
trollers is  how  this  optimal  trajectory  is  derived.  The  MPC  design  utilized  an  online  opti- 
mizer to  calculate  this  trajectory,  while  the  MRAC  design  develops  a  neural-network 
based  controller  which  is  able  to  calculate  the  optimal  trajectory  directly.  Hence  this  is  our 
first  direct  controller,  i.e.,  calculates  MV  setpoints  directly. 

Nofice  that  the  MIC  design  would  have  provided  a  direct  online  controller,  if  it  wasn't 

for: 

1)  the  lack  of  a  known  target  NOx  level,  and 

2)  the  requirements  for  MV  and  SV  constraints. 

The  MRAC  design  is  able  to  overcome  both  of  these  hurdles  by  building  knowledge  of 
the  best  achievable  NOx  level  and  by  building  all  of  the  constraints  directly  into  the  con- 
troller. The  main  advantage  to  the  MRAC  design  is  online  response  time.  There  is  no  opti- 
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mization  to  run,  one  simply  presents  the  controller  with  the  current,  and  past,  state  of  the 
process,  and  it  generates  a  MV  setpoint  as  quickly  as  a  neural  network  can  think.  These 
benefits  do  not  come  for  free,  however.  The  main  drawbacks  to  the  MRAC  design  are: 

1)  Extensive  offline  training  and  retuning  requirements. 

2)  Inflexible  online  configuration,  with  respect  to  changing  optimization 
objectives  and  operating  constraints. 

This  sections  details  the  design  of  a  neural  network-based  dynamic  MRAC  controller 
to  meet  the  problem  specifications  presented  in  Sections  4.2,  4.3  and  4.4. 

4.5.4.1  Offline  training 

Training  a  MRAC  controller  requires  two  stages.  The  first  stage  is  identical  to  trainmg 

and  returning  the  dynamic  MPC  controller.  Here,  dynamic  SV  and  CV  models  are  devel- 
oped using  the  same  steps  outlined  in  Figure  1 1 .  The  second  stage  uses  these  models  to 
train  the  controller  with  offline  data,  as  illustrated  in  Figure  12.  The  offline  training  is  sim- 
ilar to  the  online  optimization  which  is  performed  for  the  MPC  design,  except  this  optimi- 
zation is  performed  across  the  training  dataset  rather  than  online. 


ArgMin^,{fit)} 


Figure  12:  Offline  training  and  retuning  configuration  for  model  reference  controller. 
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Once  again  a  dynamic  optimizer  is  required,  and  the  objective  function  is  given  by 

(69).  To  train  the  controller,  the  optimizer  calculates  ArgMin^L{f{.t) } ,  where  W^'^  are 

the  weights  of  the  control  law  neural  network. 

Training  the  control  law  (CL)  model  starts  with  the  actual  values  for  the  DVs,  SVs  and 

CVs,  and  random  initial  weights  for  its  CL  model        .  Starting  at  time     =  T-  +  1 , 
where  T-  is  the  first  sample  in  the  training  dataset;  training  uses  the  CL  model  to  estimate 

the  resulting  MVs  mv  {t)  ;  which  are  then  used  to  estimate  the  resulting  SVs  sv  (/  +  1) 
and  CVs  cv     +  1) .  This  process  is  repeated  over  the  time  interval  t  e  {t^,    +  T]  to 

produce  the  MV,  SV  and  CV  trajectories  {mv  (t)},  =  i^     ,  {^v  (0}/  =  /o+i  and 

{cv  (0 } /  =  /o  +  1 '  respectively. 

The  training  algorithm  then  iteratively  updates  its  estimate  for  the  optimal  CL  model 

weights,  JP^    ,  to  minimize  its  objective  function  f{t) .  Each  step  in  the  iteration  per- 
forms the  following,  starting  with  t  =  t^  +  T  and  iterating  down  to  /  =     +  1 :  First,  the 

training  algorithm  calculates  the  CV  sensitivities  df(j)ldcv*{t) ,  from  which  the  CV 
Model  is  able  to  calculate  SV  sensitivities  df{t)ldsv*{t)  and  partial  MV  sensitivities 
df''(t)/dmv*(t  -  1) ,  from  which  the  SV  model  calculates  the  remaining  partial  MV  sen- 
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sitivities  df^{t)/dmv*(t  -  1) ,  and  finally  the  MV  sensitivity  at  time  t-  \  can  be  calcu- 
lated as 

df(t)      _     dfit)    ^    df\t)  (72) 
dmv*{t-\)      dmv*{t-\)  dmv*{t-\) 

The  MV  sensitivities  are  finally  passed  to  the  CL  model  which  backpropagates  them 

to  derive  its  weight  gradients  df  Idl^    ,  which  the  training  algorithm  is  able  to  use  to 
update  its  control  law's  weight  estimate. 

The  training  algorithm  then  increments     and  repeats  the  entire  process,  until  the 

training  algorithm  has  converged.  When     =  Tj^-  T,     is  reset  to     =     +  \ .  The  rea- 
son for  training  the  CL  model  in  increments  of  T  is  because  the  SV  and  CV  have  a  limited 
prediction  horizon,  the  time  before  their  estimates  are  no  longer  valid.  By  resetting  the 
state  of  these  models  to  the  actual  state  of  the  unit  after  T  samples,  we  are  able  train  the 
CL  model  within  the  prediction  horizon  of  the  SV  and  CV  models. 

4.5.4.2  Online  control 

The  online  control  configuration  for  the  MRAC  design  is  straight  forward,  as  illus- 
trated in  Figure  13.  Simply  supply  the  controller  with  the  current  SVs,  DVs  and  CVs,  and 
it  outputs  the  next  MV  setpoint.  This  setpoints  contains  knowledge  about  the  optimal 
achievable  NOx,  SV  constraints,  MV  constraints  and  the  trajectory  through  which  it  will 
drive  the  process  into  the  future. 
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Figure  13:  Online  control  configuration  for  model  reference  controller. 


Clearly,  the  controller  is  only  as  good  as  its  underlying  reference  models.  In  addition, 
considerable  care  must  be  taken  to  ensure  that  the  training  data  contains  regions  of  the 
input  space  where  the  SV  and  MV  constraints  have  been  exercised.  The  design  is  easily 
augmented  with  limiters  to  guarantee  that  MV  constraints  are  maintained.  However,  there 
is  little  that  can  be  done  to  guarantee  that  the  SV  constraints  are  maintained. 


CHAPTER  5 
DATA  PREPARATION 

Given  the  detailed  control  designs  just  presented,  the  next  step  is  to  implement  the 
actual  controllers  by  developing  the  required  reference  models.  Both  reference  model  and 
controller  implementations  require  a  significant  amount  of  process  data.  Data  collection  is 
the  most  important  aspect  of  any  modeling  or  optimization  project.  There  is  a  common 
saying  "junk  in,  junk  out,"  this  study  was  relentless  in  reenforcing  this  lesson.  Applying 
the  most  sophisticated  modeling  and/or  optimal  control  algorithms  in  the  world  will  not 
make  up  for  problems  with  data  preparation. 

With  the  advanced  distributed  control  systems  (DDS)  and  supervisory  control  and  data 
acquisition  (SCAD A)  systems  readily  available  in  today's  process  plants,  the  relative 
quantity  and  quality  of  available  data  is  overwhelming.  Much  of  the  statistics  and  model- 
ing literature  has  been  dedicated  to  the  problems  faced  when  drawing  inferences  from 
small  sample  spaces.  Modem  processing  plants  are  anything  but  data  limited.  The  relevant 
problems  are  just  the  opposite,  how  to  draw  meaningful  inferences  from  a  massive  sample 
space. 

The  following  section  presents  solutions  for  the  most  significant  challenges  faced  in 
preparing  data  for  modeling  and  optimization.  Much  of  what  is  presented  in  this  section 
was  learned  the  hard  way,  during  modeling  and  optimization. 
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5.1  Data  Management 

The  power  plant  treated  in  this  work  collects  and  stores  tens  of  thousands  of  variables 

from  various  sensors  and  actuators  throughout  the  plant.  These  variables  were  collected  by 
the  DCS  and  forwarded  to  a  data  historian  called  PI,  marketed  by  OSI,  Incorporated. 
Notation:  The  author  will  reserve  the  term  "variable"  to  represent  process 
states  with  a  specific  physical  interpretation,  which  are  considered  relevant 
for  modeling.  The  DCS  will  collect  information  from  many  sensors  or 
actuators  each  of  which  represent  a  single  process  variable.  The  specific 
points  collected  by  the  DCS  will  be  referred  to  as  "tags." 
The  DCS  works  with  exception-based  sampling,  a  non-uniform  sampling  scheme. 
Each  tag  is  given  an  absolute  deviation  above  which  an  exception  is  raised.  Raised  excep- 
tions are  forwarded  to  the  process  control  algorithms  within  the  DCS,  as  well  as  to  PI.  The 
PI  data  historian  then  applies  a  time  stamp  and  stores  the  exception.  The  time  resolution  of 
a  sample  can  be  trusted  within  about  ±2  seconds,  and  the  quantization  error  is  approxi- 
mately equal  to  the  deviation  settings  for  each  tag  in  the  DCS. 

This  exception-based  data  acquisition  scheme  allows  the  data  historian  to  store  an 
impressive  amount  of  data.  Uniformly  sampled  trends  are  provided  through  simple  ftinc- 
tion  calls  to  Pi's  API.  Any  subset  of  the  10,000+  tags  can  be  easily  recalled  for  arbitrary 
time  ranges  over  the  last  year  of  continuous  plant  operation.  Initial  concerns  over  the  data 
quality  given  this  non-uniform  sampling  scheme  were  quickly  dismissed.  However,  sig- 
nificant tuning  of  the  tag  deviations  was  required  however. 


5.2  Variable  Selection 

For  the  case  study  of  boiler  optimization,  it  is  clear  that  we  are  looking  for  process 
variables  that  are  related  to  the  measurement  of  NOx  or  CO,  and  impact  the  combustion 
process  with  respect  to  these  measurements.  With  the  massive  number  of  variables  to 
choose  from  in  the  database,  the  first  line  of  defense  is  to  use  our  first-principles  knowl- 
edge of  the  process. 

Interviews  were  conducted  with  engineering  and  operations  personnel  from  the  plant. 
They  were  asked  to: 

1)  Identify  all  tags  that  represent  the  variables  NOx  and  CO. 

2)  Identify  all  variables  that  have  any  affect  on  combustion  parameters  like 
emissions,  fuel  and  air  flows,  temperatures  or  pressures  inside  the 
boiler. 

3)  Rank  these  variables  with  respect  to  their  effect  on  NOx  and  CO  as 
essential,  secondary  or  minimal. 

4)  Identify  all  of  the  tags  that  represent  or  directly  impact  the  Hsted  vari- 
ables. 

5)  Classify  each  of  these  tags  as  either: 

•Setpoint:  can  be  manipulated  by  operators  via  the  DCS. 

•Tunable  Parameter  (TP):  can  be  manually  manipulated  by  "engi- 
neers with  wrenches". 

•Disturbance:  can  not  be  manipulated  but  has  an  affect  on  com- 
bustion. 

•State:  represents  a  particular  state  of  combustion  which  will  have 
an  affect  on  emissions,  can  be  a  function  of  setpoints  and/or  dis- 
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turbances  but  cannot  be  manipulated  directly. 

6)  Identify  operating  constraints  and  concerns  that  require  monitoring 
when  manipulating  any  of  the  setpoints. 

This  process,  and  subsequent  iterations,  produced  76  essential,  100  secondary  and  300 
minimal  tags  for  Canal  Electric  Generating  Station.  The  essential  tags  are  listed  in  Appen- 
dix. The  operators  and  engineers  identified  CO  as  their  overwhehning  operating  con- 
straint. The  process  of  variable  selection  shall  be  continued  in  Section  6.2  "Model 
Definitions." 

5.3  Validation 

In  order  to  get  a  handle  on  the  quality  of  data  being  collected  by  the  data  historian,  for 
each  of  the  essential  tags  above,  one  month  of  data  were  extracted  between  1/1/1998  and 
2/1/1998.  The  sampling  rate  was  set  to      =  5  seconds.  One  of  the  most  usefiil  data  vali- 
dation tools  was  to  simply  trend  each  tag  over  various  intervals.  The  eye  is  able  to  spot 
most  data  integrity  issues  that  are  commonly  missed  by  statistical  indicators. 

The  standard  descriptive  statistics  of  mean,  variance,  max  and  standard  error  were  cal- 
culated for  each  tag  over  the  datasets.  These  statistics  were  compared  against  first-princi- 
ples knowledge  to  look  for  the  data  integrity  issues  considered  below. 

5.3.1  Quantization  and  Clipping 

The  most  common  data  integrity  problem  encountered  with  industrial  data  historians 

is  having  invalid  range  settings.  Range  settings  for  event  based  data  acquisition  systems 
are  analogous  to  sampling  rate  for  uniform  sampling.  There  are  three  important  settings, 
the  data  minimum,  data  maximum  and  absolute  deviation.  When  either  the  data  minimum 
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settings  are  too  high  or  the  data  maximum  settings  are  too  low,  the  archived  data  is 
cUpped.  In  addition,  if  the  deviation  setting  is  too  large,  quantization  errors  can  signifi- 
cantly corrupt  the  archived  data.  The  best  way  to  detect  these  errors  is  simple  visual 
inspection,  and  the  only  remedy  is  to  correct  the  settings. 

5.3.2  Missing  Data 

Along  with  each  sample,  the  historian  provided  status  information  indicating  any 
errors  that  were  detected  by  the  DCS  when  collecting  the  sample.  Considering  any  error  as 
missing,  the  dataset  contained  10.46%  missing  data.  Since  one  of  the  goals  of  this  project 
is  to  evaluate  the  performance  of  both  static  and  dynamic  models,  all  missing  values  will 
need  to  be  accounted  for  despite  the  fact  that  there  numbers  are  quite  low. 
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Figure  14:  Daily  %  missing  across  February  dataset. 


Figures  14  illustrates  the  daily  sum  of  missing  values  across  the  dataset.  One  can  eas- 
ily identify  a  region  of  four  consecutive  days  where  the  plant  was  missing  data.  This 
region  corresponds  to  a  unit  shutdown.  Note  that  removing  large  blocks  of  time  from  the 
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data  will  not  be  an  issue  for  dynamic  modeling,  as  long  as  the  valid  data  falls  into  a  rela- 
tively small  number  of  blocks  with  a  large  number  of  samples  that  are  free  of  missing  val- 
ues. Removing  this  region  from  our  analysis,  the  missing  percentages  become  less  than 
1%. 

Since  the  remaining  errors  are  highly  sporadic  and  down  sampling  of  the  data  is 
required,  there  exist  a  solution  for  dealing  with  the  remaining  errors.  The  collected  data 
was  sampled  at      =  5  seconds.  Down-sampling  to      =  60  seconds  will  be  applied  by 

averaging  12  consecutive  samples  and  decimating.  If  6  or  fewer  of  the  samples  are  in 
error,  then  the  remaining  will  be  averages  and  the  status  of  the  decimated  sample's  status 
will  be  set  to  "good".  If  more  than  6  samples  are  in  error,  then  the  decimated  sample's  sta- 
tus will  be  set  to  "missing".  This  procedure  removed  all  remaining  errors  from  this 
dataset. 

Although  the  above  procedure  was  able  to  clean  all  of  the  data  for  this  dataset,  it  is  still 
possible  that  future  datasets  may  still  contain  missing  values.  For  all  other  cases,  missing 
data  will  be  replaced  with  interpolated  data  between  the  nearest  surrounding  valid  sam- 
ples. 

5.3.3  Outliers 

Treating  each  tag  as  a  random  variable,  the  tags  were  standardized  to  a  Z-Score  by 
subtracting  the  mean  and  dividing  by  their  standard  deviation 

Ziix)  =  —Tl .  (73) 
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The  Z-score  removes  all  effects  of  offset  and  measurement  scale.  They  can  approach 
positive  and  negative  infinity.  Table  1  shows  the  probability  of  the  absolute  value  of  a  Z- 
score  exceeding  some  limit  for  normally  distributed  variables. 


Table  1 :  Probability  of  Z-Score  exceeding  value. 


|Z-Score| 

P(Exceeding) 

1.28 

0.2 

1.64 

0.1 

1.96 

0.05 

2.58 

0.01 

Calculating  the  probabilities  of  a  Z-score  value  exceeding  a  for  our  datasets  as 


\  ^\\  |Z.(x,.)|  >  a 
A^Alo  else 


(74) 


where  N  is  the  total  number  of  samples  in  the  dataset.  Tags  with  P{Z-  >  1 .64)  >  0.20  we 
considered  candidates  for  filtering  or  smoothing  operations,  and  tags  with 
P(Z-  >  2.58)  >  0.02  were  considered  candidates  for  outlier  removal. 

Filtering  operations  were  most  commonly  applied  to  tags  with  sensor  cleaning  and  cal- 
ibration spikes.  Sensor  calibration  can  occur  as  often  as  hourly  for  sensors  exposed  to  the 
fluegas.  Cleaning  and  calibration  will  corrupt  the  data  with  short-duration  spikes  within 
the  range  of  normal  data  for  the  tag.  Lowpass  filtering  was  used  to  remove  these  effects 
when  encountered. 

Tags  with  outliers  had  large  spikes,  well  outside  the  range  of  normal  data.  Outliers 
were  most  often  caused  by  glitches  during  data  acquisition.  Errors  in  the  data  acquisition 
system  characteristically  caused  large  instantaneous  changes  in  the  tag's  value  for  a  short 
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period;  after  this  period  the  tag's  value  would  instantaneously  return  to  its  true  value.  Out- 
liers were  removed  by  replacing  the  missing  data  with  linearly  interpolated  data. 

5.4  Time  Constants 

There  are  many  delays  or  dead-times  inherent  to  the  process.  The  natural  response 

time  of  most  MV  actuators,  however,  is  less  than  1  second.  Their  effect  on  combustion  is 
felt  within  seconds.  Much  of  this  will  take  place  faster  than  it  can  resolved.  If  its  effect  can 
be  measured  within  our  sampling  resolution,  then  the  dynamic  models  should  have  no 
trouble  extracting  the  temporal  relationships. 

There  are  two  temporal  lags  that  will  give  the  modeling  effort  trouble.  First,  some  set- 
point  actuators  are  driven  with  PID  loops  which  have  been  dampened  to  prevent  operators 
from  over  reacting.  The  steady-state  settling  time  of  such  loops  can  be  as  long  as  90  min- 
utes. This  situation  has  been  compensated  for  by  using  a  simple  first-order  low  pass  filter 
to  dampen  the  actuator  setpoint  signal  to  match  the  actuators  response  characteristics. 

Second,  some  sensors,  particularly  the  continuous  emissions  monitors  (CEMs),  can 
have  significant  extraction  times.  The  Canal  Electric  Generating  Station  CEMs  measuring 
NOx  and  CO  add  a  8  to  10  minute  dead-time  between  the  gases  formation  in  the  boiler 
until  they  are  recorded.  This  situation  has  been  taken  care  of  by  shifting  these  tags  within 
the  dataset,  such  that  setpoints  and  process  outputs  are  aligned. 

5.5  Normalization 

In  support  of  modeling,  all  tags  were  normalized.  Variables  applied  to  a  neural  net- 
work should  fall  within  the  neuron's  activation  limits.  All  of  the  neural  networks  consid- 
ered in  this  study  utilize  a  tank  activation  function,  therefore  all  variables  were  normalized 
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to  fall  within  the  range  [-0.9,0.9] .  Assuming  normal  distributions,  the  tag  Z-score's  were 
used  for  normalization  such  that  only  1%  of  the  data  would  fall  outside  the  neuron's  acti- 
vation limits,  according  to 

X  =  ^(^^  +  0-9)  -  2.58 ,  (75) 

2  58  —  (—2  581 

where  r  =  —  ^ — '■ — -  =  2.867  is  the  ratio  between  acceptable  Z-score  range  and  the 

0.9  -  (-0.9) 

neuron's  activation  range.  Denormalization  was  then  preformed  according  to 

X  =  ^.x  +     +  cy[^  -  0.9))  .  (76) 


CHAPTER  6 
MODELING 

The  modeling  objectives  for  this  work  are  inherently  tied  to  the  control  designs  and  the 
case  study  presented  in  the  proceeding  sections.  To  this  end,  the  objective  of  this  section  is 
to  determine  the  best  model  architecture  for  each  of  the  model  definitions  required  by  the 
various  control  designs  considered.  Candidate  model  architectures  will  be  judged  based  on 
their  ability  to  predict  process  dynamics.  The  following  architectures  will  be  considered: 

1 )  Auto-Regressive  Moving  Average  Model  ( ARMA) 

2)  Multi-layer  Perceptron  (MLP) 

3)  Time-Delay  Neural  Network  (TDNN) 

4)  Gamma  Neural  Network  (GNN) 

5)  Nonlinear  State-Space  Model  (NLSS) 

Notice  that  the  ARMA  model  has  been  included  to  provide  a  benchmark  and  to  vali- 
date the  application  of  nonlinear  control  strategies  for  the  case  study  application. 

6.1  Methodology 

The  methodology  for  developing  the  "best"  models  will  be  as  follows: 

1)  Model  Definitions:  The  ultimate  goal  is  to  find  the  best  possible  models 
for  NOx  and  CO.  The  control  designs  presented  require  that  these  pro- 
cesses are  represented  using  specific  model  definitions.  Detailed  speci- 
fications for  the  models  are  developed,  i.e.,  idenfifying  the  specific 
process  inputs  and  outputs  to  be  used  by  the  models. 

2)  Datasets:  A  3  month  dataset  will  be  generated  using  the  methodology 
presented  in  Chapter  5,  and  divided  into  disjoint  training,  cross-valida- 
tion and  testing  regions. 
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3)  Learning  Algorithm:  The  learning  algorithm  used  for  model  training 
combines  the  Polak-Ribiere  algorithm  [49]  with  a  line  search  as  pre- 
sented by  Brent  [14].  This  algorithm  is  presented  along  with  the  details 
describing  its  application. 

4)  Performance  Criteria:  The  criteria  for  selecting  the  "best"  model  are 
presented. 

5)  Variable  Pruning:  A  MLP  is  constructed,  and  starting  from  the  input 
sets  determined  from  first-principles  knowledge  in  Section  5.2  "Vari- 
able Selection"  the  input  sets  are  pruned  to  the  smallest  possible  set  of 
relevant  variables. 

6)  Architecture  Selection:  Optimal  parameters  (e.g.  number  of  hidden  lay- 
ers, processing  elements  and  memory  taps)  will  be  individually  deter- 
mined for  each  combination  of  architecture  and  process  output  using  a 
direct  search  methodology. 

7)  Analysis:  The  results  will  be  analyzed  to  find  the  best  steady-state  and 
dynamic  models  for  each  model  definition. 

6.2  Model  Definitions 

In  support  of  the  control  designs  presented  in  Chapter  4,  models  will  be  developed 

according  to  the  following  model  definitions: 


1)  Steady-State  SV  Model:  sv  = 

ssSVModel(mv,  dv) 

2)  Steady-State  CV  Model:  cv  = 

ssCVModel{mv,  dv,  sv) 

3)  Steady-State  ISV  Model:  sv  = 

ssISVModel{cv,  dv) 

4)  Steady-State  IMV  Model:  mv 

=  ssIMVModel{cv,  sv,  dv) 

5)  Dynamic  SV  Model:  sv(t  +  1 ) 

=  dSVModel(mv{t),dv{t)) 

6)  Dynamic  CV  Model: 

c^{t+\)  =  dCVModel{mv(t),dvit),svit+  I)) 
where  the  vectors  mv,  dv,  sv  and  cv  represent  the  steady  state  values  of  the  MV, 
DV,  SV  and  CV  variable  sets,  respectively; /«v(/),  dv(t)  ,  sv(t)  and  cv(t)  represent 
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their  respective  values  at  time  / ;  and  by  cv  =  ssCVModel{mv,  dv,  sv)  we  mean  that 

ssCVModel  is  a  model  which  takes  on  vectors  mv ,  dv  and  sv  as  inputs,  and  produces 

the  vector  cv  as  an  output. 

Notation:  The  term  model  definition  will  be  used  to  describe  the  input/out- 
put space  of  a  model  along  with  whether  it  is  steady-state  or  dynamic, 
where  model  refers  to  a  particular  realization  of  a  predictor  which  imple- 
ments the  model  definition.  There  will  be  many  models  developed  which 
implement  each  of  the  above  model  definitions. 

6.2.1  Variable  Deflnitions 

The  model  definitions  were  based  on  definitions  for  the  MV,  DV,  SV  and  CV  variable 

sets.  The  tag  list  in  Appendix  is  categorized  according  to  the  variable  definitions,  which 
shall  then  be  used  to  implement  the  model  definitions. 

6.2.1.1  Control  variables 

The  case  study  considers  a  single  control  variable,  NOx.  While  CO  is  a  process  output 

and  maintaining  appropriate  levels  of  CO  is  also  an  objecfive,  CO  shall  be  considered  a 
constrained  SV.  Notice  that  NOx  has  been  marked  as  a  CV  in  the  Essential  Tag  List  under 
the  field  labeled  "Type." 

6.2.1.2  Manipulated  variables 

The  MVs  are  defined  as  the  variables  that  we  want  the  controller  to  manipulate.  There 

are  many  inputs  that  could  be  manipulated,  but  there  is  a  cost  associated  with  the  number 
of  manipulated  inputs.  These  costs  include: 

1)  generalization  costs  associated  with  the  "curse  of  dimensionality," 
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2)  computational  costs  associated  with  the  optimization  during  both  mod- 
eling and  control,  and 

3)  operational  costs  associated  with  getting  the  operators  to  implement  the 
MV  setpoints. 

The  goal  is  for  the  controller  to  manipulate  those  variables  which  have  the  greatest 
impact  on  combustion,  but  the  number  of  inputs  should  be  restricted.  Beginning  with  a 
wide  list  of  all  potential  MV  candidates  developed  from  first-principles  knowledge  of  the 
process,  variables  with  the  least  impact  on  the  SVs  and  CVs  are  pruned.  The  pruning 
methodology  will  be  presented  in  Section  6.6  "Variable  Pruning."  The  complete  list  of 
potential  MVs  is  presented  in  the  Essential  Tag  List. 

6.2.1.3  Disturbance  variables 

The  disturbance  variables  are  defined  as  combustion  variables  that  have  an  affect  on 

combustion  and  are  independent  of  all  other  MVs  and  DVs.  Variables  which  are  functions 
of  MVs  or  other  DVs  will  be  considered  as  SVs.  The  associated  costs  with  MVs  all  apply 
to  DV  except  for  the  operator  manipulation  costs.  In  addition  to  true  process  disturbances, 
the  DVs  will  contain  MVs  that  we  have  chosen  not  to  manipulate.  The  initial  DVs  are 
hsted  in  the  Essential  Tag  List. 

6.2.1.4  State  variables 

As  mentioned  above,  CO  will  be  considered  a  state  variable.  This  will  allow  the  con- 
troller to  constrain  its  allowable  levels.  This  will  not,  however,  be  our  only  SV.  The  initial 
set  of  SVs  considered  for  modeling  is  presented  in  the  Essential  Tag  List. 

6.2.1.5  Variable  representation 

You  probable  noticed  that  many  of  the  variables  chosen  to  represent  initial  MVs,  DVs, 

SVs  or  CVs  represent  the  same  underlying  process  variable.  The  process  in  Section  5.2 


103 

"Variable  Selection"  resulted  in  subsets  of  tags  that  represent  or  impact  the  same  physical 
variable,  but  they  each  represent  the  variable  in  a  slightly  different  way. 

For  example  consider  the  physical  variable  of  gross  airflow,  the  total  amount  of  air 
entering  the  boiler.  It  is  clear  from  first-principles  that  this  variable  has  a  significant 
impact  on  NOx  (see  Chapter  3).  The  forced  draft  (FD)  fans  deliver  gross  airflow  to  the 
combustion  process  (see  Section  2.5  "Fossil-Fired  Power  Generation").  The  output  of  the 
FD  fans  is  derived  from  the  boiler  master  signal.  Forced  draft  output  is  specified  along 
with  fuel  flow  by  the  fuel-air  curve  of  the  boiler.  The  DCS  has  five  different  representative 
tags  for  the  variable  of  gross  airflow: 

1)  Fan  vane  position:  The  FD  fans  are  constant  speed  fans,  meaning  that 
the  fan  shaft  turns  at  a  constant  speed  while  more  or  less  air  with  more 
or  less  initial  spin  can  be  dumped  into  the  blades  by  opening  or  closing 
the  vanes. 

2)  Fan  amps:  If  the  inlet  vanes  are  opened  wider  the  flow  is  greater,  and 
more  work  is  being  done  and  the  amperage  must  increase. 

3)  FD  fan  demand:  The  fuel-air  curve  gives  a  total  air  flow  requirement, 
which  is  characterized  within  the  DCS  as  a  demand  signal  for  the  FD 
fan  controller. 

4)  02  trim:  Prior  to  presenting  the  FD  fan  demand  signal  to  the  FD  fan 
controller,  the  operator  is  provided  with  a  trim  signal  that  can  shift  this 
demand  +-10%  of  its  range.  This  trim  allows  the  operators  to  add  or 
remove  gross  air  at  their  discretion. 

5)  FD  fan  setpoint:  This  DCS  signal,  which  is  simply  the  addition  of  the 
FD  fan  demand  with  the  02  trim,  is  presented  the  FD  fan  control  logic 
where  a  PID  control  loop  maintains  desired  airflow. 

Notice  that  all  of  these  representations  are  in  our  essential  tag  list  marked  as  MVs. 
Cleariy,  these  variables  are  not  all  independent.  In  addition  to  "Type"  you  will  notice  a 
field  in  the  Essential  Tag  List  called  "Group."  This  field  will  be  used  to  identify  tags 
which  represent  the  same  physical  process  variables.  The  following  sections  present  a 
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methodology  for  reducing  the  MVs,  DVs  and  SVs  down  to  a  minimal  set  which  have  the 
greatest  impact  on  combustion.  All  of  the  variable  from  a  group  can  be  removed  from  con- 
sideration, but  at  most  one  variable  from  each  group  will  be  allowed  in  our  final  variable 
sets. 

6.3  Datasets 

Section  5.3  presented  the  sampling  methodology  for  retrieving  process  data  from  the 
data  historian.  This  methodology  was  applied  to  produce  a  contiguous  time  series  of  3 
months  of  process  data  from  1/1 8/98  to  4/1 8/98  while  the  unit  was  operating  continuously. 
This  data  also  included  parametric  testing  of  key  MVs  thought  to  affect  the  process  out- 
puts of  interest. 

The  3  months  of  available  data  had  to  be  divided  into  datasets  for  training,  cross-vali- 
dation and  testing  of  each  of  the  models.  Due  to  the  temporal  nature  of  the  data  and  the 
dynamic  nature  of  the  models  being  considered,  the  datasets  would  each  have  to  be  contig- 
uous in  time  yet  disjoint  from  one  another.  In  order  to  accomplish  this  while  trying  to  keep 
both  the  cross-validation  and  testing  data  as  close  as  possible  to  the  training  data,  the  data 
was  divided  into  3  contiguous  and  disjoint  time  regions  as  follows:  the  first  2  weeks  were 
used  for  cross-validation,  weeks  2  thru  10  were  used  for  training,  and  the  remaining  2 
weeks  were  used  as  a  blind  test  set. 

Each  time  region  was  then  sampled  according  to  the  method  outlined  in  Section  5.3  to 
produce  training,  cross-validation  and  testing  datasets.  The  cross-validation  and  testing 
regions  were  only  sampled  once,  producing  only  one  cross-validation  and  one  testing 
dataset  common  to  all  models  considered.  The  training  region,  however,  was  sampled 
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D  =  30  times,  producing  30  training  datasets  as  30  observations  of  the  underlying  pro- 
cess. These  30  observations  were  sampled  by  offsetting  the  start  of  each  sampling  by  2 
seconds  from  the  previous,  where  +2  seconds  is  the  approximate  resolution  of  the  data 
historian. 


6.4  Performance  Criteria 

The  objective  of  this  section  is  to  develop  the  "best"  possible  models  based  on  the 

model  definitions  presented  in  Section  6.2.  To  this  end,  many  models  will  be  developed 
for  each  of  the  six  model  definitions;  these  models  will  then  compete  for  being  the  "best" 
for  a  particular  model  definition.  The  "best"  models  will  then  be  used  in  Chapter  7. 

This  section  will  formally  present  the  criteria  by  which  the  models  will  be  judged. 
Clearly,  these  criteria  should  be  closely  related  to  the  objective  function  used  to  train  the 
models. 

6.4.1  Objective  Function 

All  of  the  model  definitions  considered  are  for  predictors,  i.e.,  they  solve  the  general- 
ize nonlinear  regression  problem.  Model  training  will  therefore  fall  into  the  category  of 

supervised  learning,  where  there  is  a  known  desired  response  for  the  model  d{t)  e  yi  . 
Although  there  are  many  objective  function  which  can  be  applied  to  this  problem,  the 
ordinary  mean-squared-error  objective  function  is  by  far  the  most  common  and  successful. 
The  MSE  objective  function  will  be  given  by 

-/  =  — ,y  yikt)-Kt))\  {11) 
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where  p(t)  e  91    is  the  output  of  the  model,  and  T  is  the  temporal  length  of  the  training 
dataset.  Note  that  both  steady-state  and  dynamic  models  will  be  developed,  therefore  the 
objective  function  has  been  written  in  a  form  that  is  applicable  for  both.  When  the  model 
is  steady-state,  T  can  be  considered  the  number  of  samples  in  the  training  dataset. 

The  subsequent  sections  will  be  training  multiple  models  for  a  given  model  definition, 
and  then  comparing  them  to  see  which  performed  the  best.  The  MSE  could  serve  as  a  suit- 
able candidate  for  our  "best"  metric.  There  are  a  few  problems  with  comparing  models 
based  on  MSE,  however: 

1)  It  is  difficult  to  derive  meaning  from  the  MSE  value  associated  with 
testing  an  individual  model. 

2)  It  cannot  be  used  to  compare  the  performance  between  individual  out- 
puts, because  the  MSE  is  significantly  affected  by  the  power  of  the  indi- 
vidual output  variables. 

3)  It  cannot  be  used  to  compare  the  performance  of  models  tested  on  sepa- 
rate datasets,  since  the  power  of  each  variable  will  be  highly  variable 
across  datasets. 

Since  all  of  the  models  will  be  tested  against  a  common  dataset,  the  third  problem  will 
not  be  a  factor  here.  The  most  compelling  reason  not  to  use  MSE  as  a  performance  crite- 
rion, is  simply  one  of  interpretation.  It  is  very  difficult  to  tell  how  well  a  model  is  perform- 
ing relative  to  the  process.  We  will  instead  introduce  two  related  performance  criteria 
which  will  not  only  serve  as  a  metric  with  which  to  compare  models  against  one-another, 
but  whose  values  possess  simple  physical  interpretation. 

6.4.2  Normalized  Mean-Squared  Error 

The  first  metric  is  a  close  relative  to  the  MSE,  called  the  normalized  mean-squared- 

error  (NMSE).  This  metric  normalizes  the  MSE  relative  to  what  is  sometimes  called  the 
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"trivial  predictor."  This  predictor  is  simply  an  estimate  for  the  process  statistical  mean, 
and  the  NMSE  is  given  by 

NMSE^  =  y  ^     ^   ^'^      ,  (78) 

t=\  iv-i-yiit)) 

where  |af  is  the  /  -th  element  of  the  vector  pi^ ,  which  is  the  statistical  mean  of  the  desired 

response.  Notice  that  the  NMSE  is  based  on  individual  output  variables  which  allows  us  to 
compare  the  model  performance  for  specific  model  outputs. 
The  main  advantages  to  the  NMSE  over  the  MSE  are: 

1)  Differences  in  the  power  of  individual  outputs  is  normalized  out. 

2)  Simple  interpretation  can  be  used  to  provide  a  feel  for  model  perfor- 
mance based  on  the  NMSE  value  alone.  The  best  performance  a  model 
can  have  would  be  to  match  the  desired  response  precisely,  which 
would  result  in  a  NMSE  =  0 .  Similarly,  a  NMSE  =  1  indicates  that 
the  model  is  doing  no  better  than  predicting  the  process  mean.  While 
this  is  clearly  not  the  worst  performance  a  model  can  have,  it  is  certainly 
cause  for  speculation. 

6.4.3  Correlation 

Both  the  MSE  and  NMSE  metrics  represent  the  model's  error  with  respect  to  the 
desired  response.  The  error  of  a  model  describes  how  well  it  matches  the  actual  value  of 
the  desired  response.  It  is  very  possible  for  the  model  to  have  a  large  error,  and  still  con- 
tain valuable  information  about  the  process.  This  is  particularly  true  when  using  models 
for  optimization  and  control  applications. 

Consider  the  example  illustrated  in  Figure  15.  It  is  clear  that  the  NMSE  and  MSE 
between  variable  d  and  model  m ,  will  be  lower  than  the  respective  errors  between  d  and 

/M2 .  It  is  also  clear  that  /Hj  has  captured  more  of  the  dynamic  information  of  d. 
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Figure  15:  Example  of  variables  with  large  NMSE  but  high  R. 

This  example  illustrates  the  need  for  an  additional  metric.  We  are  building  models 
which  are  not  going  to  be  used  for  their  ability  to  forecast  the  actual  value  of  a  variable, 
but  rather  to  explain  the  cause-and-effect  relationships  between  the  model's  input  and  out- 
put variables.  For  such  applications,  correlation  provides  a  better  metric.  The  correlation 
of  a  model  can  be  calculated  as  follows 


n2      _  ^<i>', 


(79) 


dd 


where 


=  l^(x(0-^')(y(0-H^). 


(80) 


/  =  1 


A  R    =  0  suggest  that  there  is  no  linear  relationship  between  the  model's  output  and 


the  desired  response,  while  R^  =  1  suggests  that  they  are  identical  (in  a  linear  sense). 

Both  NMSE  and  R  provide  useful  metric  with  which  to  judge  model  performance.  In 
fact,  used  together  they  provide  direct  insight  into  the  bias-variance  dilemma  of  model 
development.  These  will  be  the  primary  metric  used  to  asses  model  performance  in  this 
section. 
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6.5  Learning  Algorithm 

The  learning  algorithm,  or  algorithms  for  a  neural-network-based  control  application 

must  be  able  to  deal  with  the  following  configurations: 

1)  Training:  where  the  initial  parameters  of  models  and/or  controllers  are 
determined  for  the  first  time. 

2)  Retuning:  where  the  model  and/or  controller  parameters  are  adjusted 
online  to  allow  the  system  to  compensate  for  changes  in  its  operating 
environment. 

3)  Optimization:  where  the  next  MV  setpoint,  or  setpoint  trajectory  must 
be  determined. 

The  proper  choice  of  a  learning  algorithm  needs  to  consider  the  specific  requirements 
for  these  three  situations.  The  first  decision  is  whether  the  learning  algorithm  should  be 
incremental  or  batch.  It  is  the  incremental  learning  schemes  that  are  usually  seen  in  close 
relationship  with  online  adaptation  as  seen  in  retuning  and  optimization.  Indeed,  the  field 
of  adaptive  controls  applies  incremental  schemes  almost  exclusively. 

By  contrast,  batch  learning  schemes  are  usually  considered  to  be  committed  to  offline, 
nonadaptive  operation,  as  found  in  training.  However,  it  is  equally  possible  to  apply  batch 
learning  in  adapfive  configurations  running  parallel  to  the  plant  in  operation  [31].  In  fact, 
there  are  many  reasons  to  consider  batch  schemes  when  dealing  with  complex  nonlinear 
control  systems,  most  importantly  batch  schemes  are  required  for: 

1)  second-order  approximation  methods  requiring  line  searches.  This  con- 
cerns particularly  the  conjugate  gradient  methods  and  Powell's  algo- 
rithm. These  algorithms  are  not  able  to  keep  the  search  directions 
conjugate  without  line  searches. 

2)  most  global  optimization  methods.  Except  for  simulated  annealing  and 
evolutionary  algorithms,  global  optimization  algorithms  require  the  cost 
function  to  be  evaluated  at  various  points  of  the  state  space  with  the 
results  compared. 
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The  computational  complexity  lost  by  not  using  second-order  approximations  is  by 
itself  reason  enough  to  restrict  our  learning  to  batch  schemes.  For  first-order  methods, 
there  are  no  estimates  of  convergence  even  for  exact  quadratic  functions.  In  other  words, 
they  can  converge  arbitrarily  slowly.  Second-order  methods  can  be  compared  by  the  num- 
ber of  cost  function  evaluations  that  are  necessary  to  reach  the  minimum  of  a  quadratic 
function.  Hryces  [31]  demonstrates  how  the  conjugate  gradients  algorithm  can  be  applied 
to  speed  up  both  the  backpropagation  and  backpropagation  through  time  algorithms. 
These  accelerations  use  only  n{K+  1)  cost  function  evaluations,  where  n  is  the  number 
of  free  parameters  and  K  is  the  number  of  evaluations  required  by  the  line  search. 

It  should  be  noted  that  the  first-order  gradient  descent  is  not  the  last  resort  for  incre- 
mental learning.  The  Kalman  training  algorithm  proposed  by  Singhal  and  Wu  [57]  and 
extended  and  applied  to  several  neurocontrol  problems  by  Puskorius  and  Feldkamp 
[51][52]  exploits  second-order  informafion  by  building  a  parameter  covariance  matrix,  an 
analogy  to  the  Hessian  matrix  in  variable  metric  local  optimization  methods.  Even  with 
sophisticated  methods  such  as  the  Kahnan  training  algorithms,  incremental  learning  has 
many  practical  limitations.  For  the  identification  of  strongly  nonlinear  plants  of  reaUstic 
size  and  difficulty,  like  those  considered  in  this  work,  these  methods  can  require  tens  of 
thousands  of  sampled  measurements  [52].  Furthermore,  they  do  no  support  global  optimi- 
zation. 

The  robustness  of  a  neurocontroller  solutions  will  be  predominately  a  function  of  its 

performance  during  training,  retuning  and  optimization.  Consider  the  consequences  of: 

1)  Showing  up  and  trying  to  win  operator  confidence  with  poorly  trained 
models. 
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2)  Having  models  which  had  worked  properly  being  placed  in  closed-loop 
control  of  a  power  plant,  which  suddenly  stop  working  following  a 
retraining. 

3)  A  controller  not  able  to  see  an  optimal  control  trajectory  because  of  the 
non-convex  nature  to  the  performance  surface  which  exist  between  two 
distinct  operating  modes  of  the  unit. 

Since  the  learning  schemes  described  above  will  be  applied  to  highly  complex  nonlin- 
ear systems,  and  thus  highly  non-convex  performance  surfaces,  global  optimization  will 
be  considered  a  requirement.  The  notation  ArgMin^{J}  shall  be  used  to  refer  to  the  glo- 
bal optimization  of  cost  functional  J  with  respect  to  the  variables    .  The  global  optimiza- 
tion algorithms  used  throughout  the  rest  of  this  work  are  detailed  next. 

6.5.1  Global  Optimization  Algorithm 

An  extremum  of  a  completely  arbitrary  function  of  real- valued  arguments  is  not  com- 
putable because  it  requires  a  combinatorial  search  in  infinite  spaces.  Global  optimization 
approaches  are  typically  forces  to  make  explicit  or  implicit  assumptions  about  the  function 
or  are  satisfied  with  trying  to  allocate  the  available  resources  so  that  the  probability  of 
finding  the  optimum  is  as  high  as  possible. 

If  an  exhaustive  search  is  not  feasible,  it  is  logical  to  try  to  cover  the  cost  fimction 
domain  as  far  as  possible  with  the  given  resources  and  then  to  use  some  local  optimization 
method.  Then  it  is  sufficient  if  every  interesting  local  basin  of  attracfion  is  hit  by  a  starting 
point.  For  n-dimensional  domains,  the  trivial  coverage  by  all  combinations  of  only  two 

values  per  dimension  leads  to  generating  2"  starting  vectors.  These  correspond  the  cor- 
ners of  a  hypercube.  Yet  sparser  coverage  is  by  2n  vectors  corresponding  to  starting 
points  in  the  centers  of  the  hypercube  sides. 
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For  training  and  retuning  learning  tasks,  these  methods  are  still  far  too  computation- 
ally burdensome.  For  these  configurations  we  will  have  to  be  satisfied  with  simply  gener- 
ating     starting  vectors  uniformly.  For  optimization  learning  tasks,  however,  there  are 
far  fewer  free  parameters,  one  for  each  MV.  For  this  configuration,  global  optimization 

shall  be  determined  by  running  2N^^  local  optimizations  from  starting  vectors  with 
elements 


^7 


1  i  =  2j-\ 

-1  =  2J    .  (81) 

0  otherwise 


6.5.2  Local  Optimization  Aigorithm 

The  conjugate  gradients  algorithms  used  for  all  local  optimization  is  given  by: 

I:    Given  a  set  of  weights      ,  determine  a  search  direction  JPg  using  the 
Polk-Ribere  algorithm  presented  in  Section  3 

II:   Optimize  the  objective  function  /( +  iJPg)  with  respect  to  the  sca- 
lar variable  / ,  using  the  line  search  algorithm  presented  in  Section 
2.1.2.5 

III:  Update  the  weights  according  to  T^^  +  1  =  W^  +  lJP^g 

IV:  Repeat  until  either  e  >  e'""''  or  -        +  i )  <  Af'" 

where  e  is  the  index  of  iteration,  e"'"^  is  the  maximum  number  of  epochs,  and  Aj"""  is 
the  minimum  performance  improvement  for  early  stopping. 
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6.6  Variable  Pruning 

Appendix  identified  the  initial  tag  sets  to  be  used  as  MVs,  DVs  and  SVs.  The  goal  of 

this  section  is  to  develop  a  methodology  for  pruning  these  tag  sets  into  the  smallest  set  of 
variables  required  to  produce  accurate  models  for  the  model  definitions  presented  in  Sec- 
tion 6.2.  At  this  point,  we  have  exhausted  our  first-principles  understanding  of  the  process, 
which  was  used  to  create  the  Essential  Tag  List.  The  variable  pruning  methodology  will 
therefore  consist  of  training  multiple  models  with  different  combinations  of  inputs,  and 
selecting  the  best  models  based  on  the  performance  criteria  above.  The  methodology  is  an 
educated  direct  search  approach. 

There  are  a  few  rules  and  assumptions  that  will  be  used  to  limit  the  search: 

1)  Only  one  variable  from  each  group  will  be  allowed  for  each  type. 

2)  Operators  do  not  want  to  manipulate  more  than  8 

3)  Variables  for  the  best  steady-state  model  will  be  the  best  variables  for 
dynamic  modeling. 

4)  Variables  for  the  best  MLP(15,5)  model  will  be  the  best  variables  for  all 
model  architectures  considered. 

Here,  MLP(15,5)  is  defined  as  a  MLP  with  2  hidden  layers,  with  15  processing  ele- 
ments in  the  first  layer  and  5  in  the  second.  These  assumptions  are  inherently  flawed. 
Given  limited  computational  resources,  however,  they  are  reasonable  assumptions. 

The  variable  pruning  methodology  will  consist  of  two  algorithms.  Both  algorithms  use 
a  MLP(15,5)  using  the  steady-state  CV  model  definitions  with  one  modificafion;  the  CO 
variable  has  been  moved  fi-om  the  SV  set  to  the  CV  set,  i.e.,  fi-om  the  models  input  to  its 
output.  This  modification  was  made  to  force  the  pruning  algorithms  to  consider  each  vari- 
ables effect  on  both  NOx  and  CO. 
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6.6.1  Group  Pruning  Algorithm 

The  goal  of  the  group  pruning  algorithm  is  to  identify  which  variable  groups  in  the 

Essential  Tag  List  have  the  greatest  impact  on  the  performance  of  the  model.  The  group 
pruning  algorithm  was  implemented  as  follows: 

I:    Train  the  MLP(1 5,5)  a^""^  times  for  each  unique  group  in  the  Essen- 
tial Tag  List 

II:  For  each  group  model,  select  the  "best  group"  model  as  that  with  the 
lowest  cross-validation  MSE 

III:  Calculate  the  correlation  of  each  "best  group"  model  across  the  blind 
testing  dataset,  and  define  the  "removal  group"  model  as  the  "best 
group"  model  with  the  highest  correlation 

IV:  Remove  all  variables  from  the  Essential  Tag  List  that  are  in  the  group 
associated  with  the  "removal  group"  model 

V:   Repeat  this  steps  I-V  until  there  are  only  unique  groups 

remaining 

This  algorithm  was  run  with  A'^""^  =  10  and  A^^'^""^'^  =  10 .  Notice  that  there  are  a 

total  of  15  unique  group  models  (ignoring  NOx  and  CO),  resulting  in  150  model  trainings 

for  the  first  iteration.  This  results  in  1 5  "best  group"  models.  The  total  number  of  model 

trainings  required  by  this  algorithm  for  all  iterations  is  therefore 

10(15  +  14+  13  +  12+  11)  =  650.  (82) 
The  average  training  run  requires  30  minutes  on  a  300MHZ  intel  based  workstation, 

requiring  approximately  14  dedicated  days  of  CPU  time.  It  is  easy  to  see  the  limitations  to 

direct  search  methodologies.  There  is  a  lot  of  work  recently  in  the  literature  applying 

genetic  algorithms  to  discrete  optimization  problems  like  variable  and  model  architecture 

selection.  The  author  believes  that  these  techniques  have  a  great  deal  of  merit,  and  will 

result  in  significant  improvements  to  model  building  methodologies.  This  is  still  a 
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research  topic,  however,  and  not  the  research  topic  chosen  for  this  work.  The  "bruit  force" 
method  has  been  selected  because  of  its  robustness  to  parameterization. 

The  computational  complexities  of  this  algorithm  and  the  algorithms  discussed  in  sub- 
sequent sections  are  managed  using  a  distributed  computing  environment.  Distributing  the 
training  of  a  neural  network  is  yet  another  research  project.  Rather  that  distributing  the 
training  of  a  single  training  run,  the  multiple  training  runs  were  distributed  across  a  net- 
work of  computers.  The  author  had  evening  access  to  more  than  50  intel  based  worksta- 
tions connected  through  a  local  area  network.  These  machines  were  used  in  parallel  to  run 
this  and  subsequent  algorithms.  This  algorithm,  for  example,  was  run  on  a  single  Satur- 
day, when  the  machines  were  not  in  use. 
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Figure  16:  Results  of  type  pruning  algorithm. 

The  results  of  this  algorithm  are  provided  in  Figure  16.  The  algorithm  was  able  to 
remove  3  groups  from  the  model  without  degrading  its  performance.  In  fact,  there  is  an 
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increase  in  the  generalization  ability  of  the  model.  The  3  groups  removed  are  listed  in 
Table  2  in  the  order  they  were  removed. 

Table  2:  Types  in  order  they  were  removed. 


Order 

Group 

1 

Fuel 

2 

SAS 

3 

PAS 

There  could  be  concern  that  the  PAS  group  was  removed  using  this  algorithm.  There  is 
an  unbalanced  number  of  variables  in  this  group.  It  is  not  surprising  that  removing  these 
variables  might  increase  the  models  ability  to  generalize.  Given  our  restrictions  on  the 
number  of  MVs,  however,  the  PASs  are  not  a  good  candidate  for  control.  They  were 
included  primarily  to  see  if  the  models  required  them  as  DVs.  Since  they  can  be  removed 
with  only  minimal  impact  on  modeling  results,  there  is  little  reason  at  this  time  to  investi- 
gate their  impact  further.  They  are  a  good  candidate  for  future  expansion  of  our  control- 
lers, however. 

6.6.2  Representation  Pruning  Algorithm 

The  goal  of  the  representation  pruning  algorithm  is  to  identify  the  best  variable  repre- 
sentation within  each  remaining  group.  The  representation  pruning  algorithm  was  imple- 
mented as  follows: 

I:    Select  the  first  group  in  the  reduced  Essential  Tag  List 

II:   If  there  is  only  one  unique  representation  in  the  group  then  select  the 
next  group  and  goto  step  II 

III:  For  each  unique  representation  in  the  selected  group,  train  the 

MLP(15,5)  10  times  with  the  variables  for  this  representation  as  the 
only  input  variables  from  the  selected  group;  The  resulting  models  will 
be  called  "representation"  models 
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IV:  For  each  "representation"  model,  the  training  result  with  the  lowest 
cross-validation  MSE  is  selected  as  the  "best  representation"  model 

V:  Calculate  the  correlation  of  each  "best  representation"  model  across 
the  blind  testing  dataset,  and  remove  all  variables  associated  with 
every  "representation  model"  except  for  the  "best  representation" 
model  with  the  highest  correlation 

VI:  Select  the  next  group  and  goto  step  II 
Notice  that  there  are  6  remaining  groups  with  more  than  one  unique  representation, 
containing  a  total  of  12  unique  representations.  This  algorithm,  therefore,  requires  120 
model  trainings.  The  results  after  pruning  the  variable  representations  for  each  of  the  4 
groups  are  illustrated  in  Figure  17.  As  illustrated  in  this  figure,  each  group  was  pruned 
down  to  a  single  variable  representation  without  loss  of  fidelity. 
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Figure  17:  Representation  pruning  algorithm  results. 


6.6.3  Final  Variable  Sets 

The  final  variable  selections  after  the  group  and  representation  pruning  algorithms  are 

provided  in  Table  3.  These  selections  are  of  interest  from  a  first-principles  perspective. 
Notice  that  the  MVs  contain  the  primary  controls  over  gross  air,  i.e.,  the  FD  fans,  and 
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recall  from  Chapter  3  that  excess  gross  airflow  is  a  primary  cause  of  NOx  formation.  The 
MVs  also  contain  over-fire  air  (OF A),  and  gas  recirculation  (GR)  air,  which  recall  from 
Section  2.5  "Fossil-Fired  Power  Generation"  that  the  primary  reason  for  installing  the 
OFA  and  GR  subsystems  is  to  better  control  NOx. 


Table  3:  Final  variable  selections  after  pmning. 


Manipulated  Variables 

□sturtiax»  Varicdes 

State  Values 

Corttrol  Variables 

1/3  OFA  Damper  Pes 

Ambient  /\ir  Press 

Sec  Air  Temp  Side  A 

CEM  NOx 

2/3  OFA  Damper  Pos 

/\nt)ient  Air  Temp 

Sec  Air  Temp  Side  B 

FD  Fan  2A  Inlet  Vane 

Bnr  Atm  Stm  Press 

CEM  CO 

FD  Fan  2B  Inlet  Vane 

Bnr  Atm  Stm  Temp 

Generated  MW 

GR  Fan  2A  Inlet  Dmpr  Pos 

CEM  Barometric  Pressure 

Windbox  Pressure 

GR  Fan  2B  Inlet  Dmpr  Pos 

Cond  Back  Pres  -  Side  A 

GR  Fan  Hppr  Dmpr  A  Pos 

Cond  Back  Pres  -  Side  B 

GR  Fan  Hppr  Dmpr  B  Pos 

Fuel  Gas  Flew  Indication 

Fuel  Ql  Flew  Indication 

Fuel  Temp  Rted 

Fumace  Pressure 

There  are  also  some  surprises.  Consider,  for  instance,  burner  atomizing  steam  pressure 
and  temperature.  During  the  initial  operator  interviews  it  was  thought  that  the  atomizing 
steam  would  have  a  nominal  affect  on  combustion,  but  their  impact  on  the  models  devel- 
oped has  been  significant.  We  latter  verified  this  impact  by  manipulating  these  variables  at 
the  plant. 


6.7  Architecture  Selection 

The  final  variable  selections,  identified  through  the  pruning  algorithms  in  the  last  sec- 
tion, will  be  considered  the  best  variables  for  all  model  architectures.  The  author  recog- 
nizes that  the  best  variables  could  be  a  function  of  the  model  architecture.  The 
computational  complexity  involved  in  variable  pruning,  however,  makes  finding  the 
topology  specific  variables  impractical. 
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Their  are  a  number  of  architecture  specific  parameters  that  will  require  selection  for 
each  architecture,  however.  Identifying  the  best  architecture  parameters  will  employ  a 
similar  methodology  to  that  used  for  variable  pruning,  i.e.,  using  a  direct  search  of  the  fea- 
sible parameter  space.  The  following  presents  a  detailed  design  for  the  model  architectures 
investigated  in  the  study,  along  with  the  results  used  to  identify  their  best  parameteriza- 
tion. 

6.7.1  ARMA  Model 

The  ARMA  model  considered  in  this  study  is  given  by 

Kt)  =   S  ^  ^(^-")+  Z  K^  Kt-m)  +  t,  (83) 

n  =  1  ffi  =  0 

where  6  =  {mv,  i/v}  e  9^  "  are  the  model  inputs,  a„  e        \fn  are  the  auto-regressive 

coefficients,  6^  e  9?     V/w  are  the  moving  average  coefficients,      is  the  number  of 

auto-regressive  taps,  A^„,  is  the  number  of  moving  average  taps,  and  t  e  Ms  a  constant 
vector. 

6.7.1.1  Parameter  selection 

Given  the  best  set  of  MVs  and  DVs  fi^om  variable  pruning  and  only  a  single  CV,  there 

are  only  two  parameters  to  be  identified  for  the  ARMA  model,  and  .  The  algorithm 
used  to  determine  A^,,  was  as  follows: 

I:    Set      =  0 

II:   Train  ARM A(N„,  I) 

III:  Increment  A'^^  by  1  and  repeat  II  until      ^  10 
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IV:  Calculate  the  correlation  of  each  model  in  {^/?M^(7V„,  1)}^^  across 

the  blind  testing  dataset,  and  assign     ,  the  optimum  number  auto- 
regressive  taps  according  to  the  model  with  the  highest  correlation 
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Figure  18:  Results  of  auto-regressive  taps  search  algorithm  for  the  NOx  ARMA  Model. 

Figure  20  illustrates  the  results  of  this  algorithm  for  the  dCVModel.  For  the  dCV- 
Model  definition,  ARMAi^,  1 )  was  chosen  as  the  model  with  the  highest  correlation,  tak- 
ing into  consideration  our  desire  to  minimize  the  number  of  taps.  This  algorithm  was 
repeated  for  the  dSVModel  definition,  and  the  combined  results  are  summarized  in  Table 
6.  Notice  that  only  the  dynamic  model  definitions  have  been  considered  here,  since  the 
ARMA  is  a  dynamic  model. 


Table  4:  Results  of  auto-regressive  tap  search  algorithm  for  all  dynamic  models. 


Model 

PES 

Train  NMSE 

Train  R 

dCVModel 

5 

0.903 

0.194 

dSVModel 

4 

0.932 

0.178 

Given  the  optimum  number  of  auto-regressive  taps,      ,  for  each  model  definition,  a 

* 

similar  algorithm  was  used  to  determine  the  optimal  number  of  moving  average  taps,  N^^ 
This  algorithm  can  formally  be  stated  as  follows: 


I:    SetiV,,,  =  1 


II:   Train  the  MZP(iV„,  A^^) 

III:  Increment  jV^,  by  1  and  repeat  II  until  iV„,  =  10 

IV:  Calculate  the  correlation  of  each  model  in  {ARMA(N^,  jV„,)}va^„ 

across  the  blind  testing  dataset,  and  assign  A^^  ,  the  optimum  number 

of  moving  average  taps,  according  to  the  model  with  the  highest  corre- 
lation 
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Figure  19:  Results  of  the  moving  average  tap  search  algorithm  for  NOx  ARMA  Model. 
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Figure  21  illustrates  the  results  of  this  algorithm  for  the  dCVModel.  For  the  dCV- 
Model  definition,  ARMA(5,  3)  produced  the  model  with  the  highest  correlation.  Once 
again,  the  algorithm  was  repeated  for  each  of  the  dynamic  model  definitions,  and  the 
results  are  summarized  in  Table  7.  The  models  summarized  in  this  table  will  be  taken  as 
the  best  ARMA  models  for  their  corresponding  model  definitions. 

Table  5:  Results  of  moving  average  tap  search  algorithm  for  all  dynamic  models. 


McKiel 

PES 

Train  NMSE 

Train  R 

dCVModel 

3 

0.888 

0.225 

dSVModel 

4 

0.902 

0.219 

6.7.2  Multi-layer  Perceptron 

The  multi-layer  perceptron  (MLP)  architecture  considered  in  this  study  will  consist  of 

2  hidden  layers  with  tanh  processing  elements  (PEs),  and  an  output  layer  with  a  linear  PE. 
Formally,  the  architecture  is  given  by 

p  =  ^^{f\{^'ii^-h'')  +  t>'')^h\  (84) 

where  h  =  {'mv,lv}  g  51^"  are  the  model  inputs,  T^'  e  IR^*'       is  the  matrix  of 

weights  for  the  first  hidden  layer,  t!'^  e  'J?^*'  are  the  bias  values  for  the  first  hidden  layer, 

6  9?^"  ^      is  the  matrix  of  weights  for  the  second  hidden  layer,  t!*^  e  9?^"  are  the 

bias  values  for  the  second  hidden  layer,      e  9^  ^     '  is  the  matrix  of  weights  for  the 

second  hidden  layer,  b  e  9i  '  are  the  bias  values  for  the  second  hidden  layer,  TV^  is  the 
total  number  of  MVs  and  DVs  in  the  input  layer,  A^;, ,  is  the  number  of  PEs  in  the  first  hid- 
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den  layer,  A^;,2  the  number  of  PEs  in  the  second  hidden  layer,  Ny  is  the  number  of  PEs 
in  the  output  layer,  and  a  is  the  tanh  logistic  function. 

6.7.2.1  Parameter  selection 

Given  the  best  set  of  MVs  and  DVs  from  variable  pruning  and  only  a  single  CV,  there 

are  only  two  parameters  to  be  identified  for  the  MLP,  jV,,  ,  and  A^,,2  •       algorithm  used  to 
determine     ,  was  as  follows: 
I:    Set  TV;,,  =  5 

II:   Train  the  MLP{N^ , ,  5)  10  times 

III:  Select  MLP*{Ni^^,  5)  as  the  model  with  the  lowest  cross-vaUdation 
MSE  from  the  10  training  results 

IV:  Increment  A/";,,  by  2  and  repeat  II  until  A^;,,  =25 

V:  Calculate  the  correlation  of  each  model  in  { MLP*{N^  i  >  ^) }  VA';,, 

across  the  blind  testing  dataset,  and  assign  A^;, , ,  the  optimum  number 

of  hidden  PEs  in  the  first  layer,  according  to  the  model  with  the  high- 
est correlation 
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Figure  20:  Results  of  hidden  layer  #1  PE  search  algorithm  for  the  NOx  MLP  Model. 

Figure  20  illustrates  the  results  of  this  algorithm  for  the  ssCVModel.  For  the  ssCV- 
Model  definition,  MLP{\9,  5)  produced  the  model  with  the  highest  correlation  on  the 
blind  test  set.  This  algorithm  was  repeated  for  each  of  the  steady-state  model  definitions, 
and  the  results  are  summarized  in  Table  6.  Notice  that  only  the  steady-steady  model  defi- 
nitions have  been  considered  here,  since  the  MLP  is  a  static  model. 


Table  6:  Results  of  hidden  layer  #1  PE  search  algorithm  for  all  steady-state  models. 


Model 

PES 

Train  NMSE 

Train  R 

Test  NMSE 

Test  R 

ssCVModel 

19 

0.130 

0.876 

0.182 

0.793 

ssSVModel 

15 

0.252 

0.766 

0.331 

0.712 

sslSVModel 

17 

0.087 

0.892 

0.143 

0.844 

sslMVModel 

23 

0.293 

0.698 

0.354 

0.637 

Given  the  optimum  number  of  hidden  processing  elements  in  the  first  layer,  Nf^ , ,  for 

each  model  definition,  a  similar  algorithm  was  used  to  determine  the  optimal  number  of 

* 

hidden  PEs  in  the  second  hidden  layer,  Nfj2  ■  This  algorithm  can  formally  be  stated  as  fol- 
lows: 


I:    Set  A^y,2  "  2 


II:   Train  the  MIP(7V^,,7Vy,2)  10  times 

* 

III:  Select  MLP*  {N/^ , ,  7V;,2)  as  the  model  with  the  lowest  cross-validation 
MSE  from  the  10  training  results 

IV:  Increment  Nf^2  by  1  and  repeat  II  until  Nf^2  =  10 

* 

V:   Calculate  the  correlation  of  each  model  in  { MLP*{Nf^ , ,        }  VA';,^ 

across  the  blind  testing  dataset,  and  assign       >      optimum  number 

of  hidden  PEs  in  the  second  layer,  according  to  the  model  with  the 
highest  correlation 
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Figure  21 :  Results  of  the  hidden  layer  #2  PE  search  algorithm  for  NOx  MLP  Model. 


Figure  21  illustrates  the  results  of  this  algorithm  for  the  ssCVModel.  For  the  ssCV- 

Model  definition,  MLP*  (19,  7)  produced  the  model  with  the  highest  correlation  on  the 
blind  test  set.  Once  again,  the  algorithm  was  repeated  for  each  of  the  model  definitions, 
and  the  results  are  summarized  in  Table  7.  The  models  summarized  in  this  table  will  be 


taken  as  the  best  MLP  models  for  their  corresponding  model  definitions. 
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Table  7:  Results  of  hidden  layer  #2  PE  search  algorithm  for  all  steady-state  models. 


Model 

PES 

Train  NMSE 

Train  R 

Test  NMSE 

Test  R 

ssCVModel 

7 

0.135 

0.895 

0.298 

0.801 

ssSVModel 

4 

0.265 

0.786 

0.401 

0.723 

sslSVModel 

3 

0.219 

0.791 

0.365 

0.732 

sslMVModel 

9 

0.296 

0.705 

0.336 

0.641 

6.7.3  Time-Delayed  Neural  Network 

As  presented  in  Chapter  2,  the  most  commonly  applied  time-delayed  neural  network 

(TDNN)  is  simply  a  MLP  with  a  tapped-delay-line  (TDL)  preprocessor  at  its  input.  This 
architecture  can  be  formally  presented  as  follows 

P  =  jfa{Jp"'oiTp'"TDL{il)  +  ^"')  +  i"')  +  i\  (85) 

N  N  X  N 

where  TDL:')H  "     5^  "     ^  is  the  TDL  mapping,  and  A'^^.  is  number  of  taps. 

6.7.3.1  Parameter  selection 

The  TDNN  architecture,  therefore,  adds  a  single  additional  parameter  to  the  MLP 

architecture,  Nj.  Clearly,  the  MLP  parameters  A^;, ,  and  N/^j  ^il^  function  of  A'^j-.  Once 
again,  however,      will  also  be  function  of  these  parameters,  and  this  study  will  simplify 

this  problem  by  assuming  that  the  best  parameter  for  the  MLP  will  also  be  optimal  for  the 
TDNN. 

Given  the  optimum  number  of  hidden  processing  elements  for  the  MLP,  A^^, ,  and 

* 

Nij2  ,  for  each  model  definition,  the  algorithm  used  to  determine  N-^  was  implemented  as 
follows: 

I:    Set  Nt^=  2 

U:   Train  the  TDNN(nI  , ,  NI2,  Nj)  1 0  times 
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Select  TDNN*(Nfj^,  j¥;,2'  ^r)  model  with  the  lowest  cross-val- 
idation MSE  from  the  10  training  results 

Increment      by  2  and  repeat  II  until  Nj-  =  20 

Calculate  the  correlation  of  each  model  in 

{  TDNN*{nI^,n12,  across  the  blind  testing  dataset,  and 

assign  Nj^,  the  optimum  number  of  taps,  according  to  the  model  with 
the  highest  correlation 
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Figure  22:  Results  of  tap  search  algorithm  for  NOx  TDNN  Model. 

Figure  22  illustrates  the  results  of  this  algorithm.  For  the  dCVModel  definition, 
TDNN*(\9,  7,  8)  produced  the  model  with  the  highest  correlation  on  the  blind  test  set. 
The  algorithm  was  repeated  for  each  of  the  dynamic  model  definitions,  and  the  results  are 
summarized  in  Table  8.  The  models  summarized  in  this  table  will  be  taken  as  the  best 
TDNN  models  for  their  corresponding  model  definitions. 
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Table  8:  Results  of  tap  search  algorithm  for  all  dynamic  models. 


Model 

Taps 

Train  NMSE 

Test  NMSE 

Train  R 

TestR 

dCVModel 

8 

0.067 

0.124 

0.928 

0.878 

dSVModel 

4 

0.116 

0.207 

0.889 

0.796 

6.7.4  Gamma  Neural  Network 

The  Gamma  Neural  Network  (GNN)  is  similar  to  the  TDNN  considered  above,  with 

two  fundamental  differences:  1)  the  TDL  feed- forward  memory  mechanism  is  replaced 
with  an  IIR  Gamma  Filter  (GF),  and  2)  the  memory  is  embedded  within  the  hidden  layers 
of  the  architecture.  This  study  will  consider  an  architecture  containing  two  GFs,  one  in  the 
input  layer,  and  the  second  in  the  first  hidden  layer.  This  architecture  can  be  formally  pre- 
sented as  follows 

P  =  ^o{^^G,{a{^'GStl)^t>''))  +  t>"')  +  t>\  (86) 
where  G^-M^"  ^  9^^" '      is  the  input  layer  GF,  G;,:9?^*'  ^  S?'^*' '      is  the  hidden 
layer  GF,  and  these  memories  have  A^,^^  and  Nqi^  memory  taps,  respectively. 

6.7.4.1  Parameter  selection 

The  GNN,  therefore,  adds  two  additional  parameter  to  the  MLP  architecture,  A^^;^  and 

^Gh  •      again  simplify  the  parameter  selection  problem  by  assuming  that  the  best 
parameter  for  the  MLP  will  also  be  optimal  for  the  GNN. 

» 

Given  the  optimum  number  of  hidden  PEs  for  the  MLP  for  each  model  definition,  , 

and  Nfj2 ,  the  algorithm  used  to  determine  A^^^^  was  implemented  as  follows: 

h    Set  Nr.,,  =  2 


II:   Train  a  GM{N,^ , ,  iV;,2,  A^cw  4)  1 0  times 
*  * 

III:  Select  GM*{Nf^^,Nf^2^  N(j^^,  4)  as  the  model  with  the  lowest  cross- 
validation  MSE  from  the  10  training  results 

IV:  Increment  N(j^^  by  1  and  repeat  II  until  N^^^  =  10 

V:   Calculate  the  correlation  of  each  model  in 

{ GM*  (N/j  1 ,       ^Gw  4) }  VGi/  across  the  blind  testing  dataset,  and 
* 

assign  Nq^^  ,  the  optimum  number  of  taps,  according  to  the  model  with 
the  highest  correlation 


1.000 

0.900 
0.800 
0.700 
0.600 
0.500 
0.400 
0.300 
0.200 
0.100 
0.000 


-Train  R 


-Test  R 


0.893 


0.916 


0.933 


0.936 


0.931 


0.930 


0.804    0.832    0.834    0.846    0.835    0.835    0.831     0.830    0.819    0.824  0.819 


0.936 


0.934 


0.936 


0.930 


10 


0.940 


PES 


Figure  23:  Results  of  taps  search  algorithm  for  NOx  GNN  model. 
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Figure  23  illustrates  the  results  of  this  algorithm.  For  the  dCVModel  definition, 

GM*(  19,  7,  3, 4)  produced  the  model  with  the  highest  correlation  on  the  blind  test  set. 
This  algorithm  was  repeated  for  each  of  the  dynamic  model  definitions,  and  the  results  are 
summarized  in  Table  9. 
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Table  9:  Results  of  taps  search  algorithm  for  all  dynamic  models. 


Model 

Taps 

Train  NMSE 

Test  NMSE 

Train  R 

Test  R 

dCVModel 

3 

0.065 

0.156 

0.936 

0.846 

dSVModel 

5 

0.129 

0.202 

0.875 

0.795 

Given  the  optimum  number  of  memory  taps  in  the  input  layer,  NQ^^ ,  the  optimal  num- 
ber of  memory  taps  for  the  first  hidden  layer  were  determined  as  follows: 
I:    Set  N^f^  =  2 

II:   Train  the  GM(Nl^,  nI^,  Nq^,  A'g/,)  10  ti"^es 
*      *  • 

III:  Select  GM*{N^^,  N^2^  Ng^^,  Ngf^)  as  the  model  with  the  lowest  cross- 
validation  MSE  from  the  1 0  training  results 

IV:  Increment  Nq/^  by  1  and  repeat  II  until  A^^^^^  =10 

V:   Calculate  the  correlation  of  each  model  in 
*      *  * 

{ GM*(7V^|,  Nfj2'  ^cw  ^c/i)}vA'cj  across  the  bhnd  testing  dataset,  and 

assign  Nq/^  ,  the  optimum  number  of  memory  taps  in  the  first  layer, 
according  to  the  model  with  the  highest  correlation 
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Figure  24:  Results  of  hidden  taps  search  algorithm  for  NOx  GNN  model. 
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Figure  24  illustrates  the  results  of  this  algorithm  for  the  dCVModel.  For  the  dCV- 
Model  definition,  GM*{N^^,N,^2^  3,  4)  produced  the  model  with  the  highest  correlation 

on  the  blind  test  set.  Once  again,  the  algorithm  was  repeated  for  each  of  the  model  defini- 
tions, and  the  results  are  summarized  in  Table  7.  The  models  summarized  in  this  table  will 
be  taken  as  the  best  GNN  models  for  their  corresponding  model  definitions. 


Table  10:  Resuhs  of  hidden  taps  search  algorithm  for  all  dynamic  models. 


Model 

Taps 

Train  NMSE 

Test  NMSE 

Train  R 

TestR 

dCVModel 

4 

0.034 

0.086 

0.960 

0.908 

dSVModel 

5 

0.045 

0.095 

0.952 

0.903 

6.7.5  Nonlinear  State-Space  Model 

As  presented  in  Chapter  2,  the  nonlinear  state-space  (NLSS)  model  actually  consists  of 

two  separate  models;  a  state  evolution  model  and  a  output  observation  model.  Here  we 
consider  the  case  where  both  of  these  models  are  MLPs,  each  with  a  single  hidden  layer. 
Formally,  the  NLSS  model  considered  for  this  study  is  given  by 

k{t)  =  JP'aiW^'Ht)  +  ^'")  +  (87) 
Kt)  =  ^c{f\t)  +  t>'")^t>\  (88) 
where  >{t)  =  {ti{t\Kt-  1)}  e  ^^"^^^  and  ^(0  =  {6(0,^(0}  e  5R^"^^'  are  state 
representation  vectors;  T^^  e  9^^- ^  e  jf  e  m'^'''''''  and     e  9?'^^ 

are  the  weights  of  the  state  evolution  model;  Jp''^  e  9^^*"       ^     ,  t,^^  e  9?^*^ , 
T^e^^^^^^^^andS^  €  91  '  are  the  weights  of  the  output  observation  model;  A'^  is  the 
total  number  of  MVs  and  DVs  in  the  input  layer;  N,   is  the  number  of  hidden  PEs  in  the 
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state  evolution  model;  N,^y  is  the  number  of  hidden  PEs  in  the  output  observation;  A^^,  is 

the  number  of  CVs  in  the  output,  and  a  is  the  tanh  logistic  function. 

There  are  therefore  3  parameters  which  need  to  be  determined  for  the  NLSS  model:  1) 
the  number  of  hidden  states     ,  2)  the  number  of  hidden  PEs  in  the  state  evolution  model 

A^;,^ ,  and  3)  the  number  of  hidden  PEs  in  the  output  observation  model  Nf^y .  Once  again, 

these  parameters  will  be  determined  using  an  exhaustive  search  methodology.  We  begin 
by  fixing  the  number  of  hidden  PEs  in  both  models  to  4,  and  determine  the  number  of  hid- 
den states.  Defining  NLSS{N^,  Nf^^,  N/^y)  as  a  NLSS  model  as  described  above,  the  num- 
ber of  hidden  state  can  be  determined  as  follows: 
I:    Set  N^^  2 

II:   Train  the  NLSS(N^,  4, 4)  10  times 

III:  Select  NLSS*(N^,  4,  4)  as  the  model  with  the  lowest  cross-validation 
MSE  from  the  10  training  results 

IV:  Increment  A^^  by  1  and  repeat  II  until      -  \2 

V:   Calculate  the  correlation  of  each  model  in  { NLSS*(N^,  4,  4) } 

across  the  blind  testing  dataset,  and  assign     ,  the  optimum  number  of 
hidden  states,  according  to  the  model  with  the  highest  correlation 
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Figure  25:  Results  of  hidden  states  search  algorithm  for  NOx  NLSS  model. 

Figure  25  illustrates  the  results  of  this  algorithm.  For  the  dCVModel  definition, 
NLSS*  (6, 4, 4)  produced  the  model  with  the  highest  correlation  on  the  blind  test  set.  The 
algorithm  was  repeated  for  each  dynamic  model  definition,  and  the  results  are  summarized 
in  Table  1 1 . 


Table  1 1 :  Results  of  hidden  states  search  algorithm  for  all  dynamic  models. 


Model 

States 

Train  NMSE 

Test  NMSE 

Train  R 

Test  R 

dCVModel 

6 

0.090 

0.221 

0.906 

0.783 

dSVModel 

9 

0.120 

0.300 

0.879 

0.695 

Given  the  optimal  number  of  hidden  states  A'^ ,  the  optimal  number  of  hidden  PEs  in 
the  state  evolution  model  can  be  determined  as  follows: 


I:     SetiV,,  =  2 


II:   Train  the  NLSSiN^,  N,^^,  4)  10  times 
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III:  Select  NLSS*{N^,  Nf^^,  4)  as  the  model  with  the  lowest  cross-valida- 
tion MSE  from  the  10  training  results 

IV:  Increment  Nf^^  by  1  and  repeat  II  until  Nf^^  -12 

* 

V:  Calculate  the  correlation  of  each  model  in  { NLSS*  (N^,  Nf^^,  4) }  vAf^_^ 

* 

across  the  blind  testing  dataset,  and  assign  A'^;,^  according  to  the  model 
with  the  highest  correlation 
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Figure  26:  Results  of  state  hidden  PEs  search  algorithm  for  NOx  NLSS  model. 

Figure  26  illustrates  the  results  of  this  algorithm.  For  the  dCVModel  definition, 

NLSS* (6,  9, 4)  produced  the  model  with  the  highest  correlation  on  the  blind  test  set.  The 
algorithm  was  repeated  for  each  of  the  model  definitions,  and  the  results  are  summarized 
in  Table  12. 


Table  12:  Results  of  state  hidden  PEs  search  algorithm  for  all  dynamic  models. 


Model 

States  PEs 

Train  NMSE 

Test  NMSE 

Train  R 

TestR 

dCVModel 

9 

0.017 

0.129 

0.988 

0.868 

dSVModel 

11 

0.047 

0.191 

0.950 

0.806 
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* 

Finally,  the  optimal  number  of  hidden  states  A^^  and  hidden  PEs  in  the  state  evolution 

model  A^^^ ,  the  optimal  number  of  hidden  PEs  in  the  output  observation  model  can  be 
determined  as  follows: 


I:    Set  Nf^y  =  2 


II:   Train  the  NLSS{N^,  N,^^,  N,^y)  10  times 

III:  Select  NLSS*{N^,  N/^^,  N/^^)  as  the  model  with  the  lowest  cross-vaU- 
dation  MSE  from  the  10  training  results 

IV:  Increment  A^;,^  by  1  and  repeat  II  until  A^^^  =12 

V:  Calculate  the  correlation  of  each  model  in 
*  * 

{NLSS*{N^,  Nf^^,  A^/,^)}vA'^,,  across  the  blind  testing  dataset,  and 
* 

assign  Nf^^  according  to  the  model  with  the  highest  correlation 
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Figure  27:  Results  of  output  hidden  PEs  search  algorithm  for  NOx  NLSS  model. 


Figure  27  illustrates  the  results  of  this  algorithm.  For  the  dCVModel  definition, 
NLSS* (6,  9,  5)  produced  the  model  with  the  highest  correlation  on  the  blind  test  set.  The 
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algorithm  was  repeated  for  each  of  the  model  definitions,  and  the  results  are  summarized 
in  Table  13. 


Table  13:  Results  of  output  hidden  PEs  search  algorithm  for  all  dynamic  models. 


Model 

Output  PES 

Train  NMSE 

Test  NMSE 

Train  R 

Test  R 

dCVModel 

5 

0.030 

0.094 

0.965 

0.896 

dSVModel 

4 

0.070 

0.135 

0.926 

0.864 

6.8  Analysis 

Figure  28  summarizes  the  final  modeling  results  after  architecture  selection.  Recall 
that  the  objective  of  this  section  was  to  find  the  most  accurate  model  for  the  six  model  def- 
initions required  to  implement  our  four  control  designs.  The  best  models  identified  for 
each  definition  are: 

1)  Steady-State  SV  Model:  MLP*(\5,  4) 

2)  Steady-State  CV  Model:  MLP*  (19,  7) 

3)  Steady-State  ISV  Model:  MLP* (17,  3) 

4)  Steady-State  IMV  Model:  MLP*  (23,  9) 

5)  Dynamic  SV  Model:  GM*(\5,  4,  3,  4) 

6)  Dynamic  CV  Model:  GM*(  19,  7,  5,  5) 


Figure  28:  Best  models  for  all  model  definitions  by  architecture. 


The  next  chapter  will  implement  each  control  design  using  these  models. 


CHAPTER  7 
CONTROLLER  IMPLEMENTATIONS 

This  section  will  implement  the  four  control  designs  presented  in  Chapter  4,  using  the 
"best"  reference  models  developed  in  Chapter  6,  and  identify  the  "best"  control  design  for 
meeting  the  objectives  outlined  in  Section  4.4  "Performance  Criteria."  The  "best"  control- 
ler will  be  determined  as  follows: 

1)  Using  dynamic  models  as  simulators  for  the  plant,  controller  perfor- 
mance is  quantified  offline. 

2)  Allowing  the  controllers  to  manipulate  actual  plant  values,  controller 
performance  is  quantified  online. 

7.1  Offline  Quantification 

To  quantify  the  performance  of  the  four  control  designs  offline,  a  common  dataset  and 

plant  simulator  need  to  be  selected.  The  dataset  should  not  include  data  that  was  used  dur- 
ing training  of  the  controllers  or  their  underlying  reference  models.  Recall  that  a  blind  1- 
week  dataset  was  set  aside  during  modeling  for  testing  and  that  none  of  the  models  have 
ever  seen  this  dataset  during  training.  In  addition,  the  dynamic  process  models  chosen  for 
offline  quantification  should  not  be  models  that  were  used  as  reference  models  for  the  any 
of  the  control  designs.  Since  the  best  models  developed  in  Chapter  6  were  used  as  refer- 
ence models,  i.e.,  the  models  with  the  lowest  cross-validation  MSE  from  10  separate  train- 
ings, the  offline  quantification  will  use  the  second  best  models  as  the  dynamic  process 
models. 
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Offline  quantification  will  apply  each  controller  to  the  dynamic  process  models  across 
the  test  dataset,  and  calculate  the  average  NOx  reduction  along  with  the  average  CO  pro- 
duction above  the  maximum  CO  constraint.  Formally,  the  average  NOx  percent  reduction 
will  be  reported  according  to 


^NOx  =  — ^      N0x*(t)-N0x{t),  (89) 

where  T-  is  the  start  of  the  test  dataset,  Tj-  is  its  end,  NOx(t)  is  the  value  of  NOx  at  time  t 

predicted  by  the  dynamic  process  models  in  response  to  the  actual  MV  setpoints  mv(t), 
and  NOx*(t)  is  the  predicted  value  of  NOx  in  response  to  the  controllers  optimal  MV  set- 

 ^  ^ 

points  mv  (t) . 

Similarly,  the  average  CO  above  the  maximum  constraint  will  be  reported  according 

to 


rcoi  =  — !—  y 


^  CO*{t)-CO"""'        CO*(t)>CO"""  ^9Q^ 
0  else 


where  CO*{t)  is  the  value  of  CO  at  time  t  predicted  by  the  dynamic  process  models  in 

response  to  the  controllers  optimal  MV  setpoints  mv*{t) ,  and  CO"'"^  is  the  max  limit  set 
for  CO.  All  of  the  results  presented  are  for  a  CO  maximum  of  SOOppm. 

The  optimal  MV,  SV  and  CV  trajectories  across  the  test  dataset  produced  by  each  con- 
troller are  estimated  by: 
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I:    Set  optimal  MV,  SV  and  CV  trajectories  to  their  actual  values 


mv  (t)  =  mv(t) 

cv*{t)  =  cv(t) 
II:   For  t  =  T.  to  T^-  1 

 ^:)c 

i:  Get  the  optimal  MVs  from  the  controller,  mv  (t) 

— ^* 

ii:  Use  the  SV  model  to  calculate  the  resulting  SVs,  sv  (1+  I) 

iii:  Use  the  CV  model  to  calculate  the  resulting  CVs,  cv  (t+  I) 

iv:  Copy  forward  the  MVs  to  carry  forward  the  process  state  to  initialize  the 
next  step 

^*(/+ 1)  =  ;^*(/)  (92) 

V:  Next  t 

The  results  of  applying  this  algorithm  to  each  controller  are  illustrated  below.  Figure 


29  presents  the  average  NOx  reduction  ANOx ,  while  Figure  30  illustrates  the  average  CO 


above  its  max  [C0~\ .  These  are  not  particularly  encouraging  results.  While  each  control- 
ler did  manage  to  reduce  NOx,  the  reductions  were  quite  small.  Furthermore,  the  control- 
lers seemed  to  have  even  less  effect  on  the  CO  above  SOOppm.  The  key  question  at  this 
stage  is  to  determine  whether  these  results  represent: 

1)  all  of  the  potential  NOx  reductions  inherent  to  the  process, 

2)  a  problem  with  the  control  design,  or 

3)  a  problem  with  the  reference  models  for  the  underlying  process. 
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Figure  29:  Average  NOx  reduction  over  testing  dataset. 

To  help  answer  this  question,  a  second  run  of  the  offline  quantification  algorithm  was 
run.  This  time,  however,  the  test  models  were  replaced  with  the  reference  models  used  to 
develop  each  controller.  These  results  are  presented  in  Figures  31  and  32. 
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Figure  30:  Average  CO  above  max  over  testing  dataset. 
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Figure  3 1 :  Average  NOx  reduction  over  testing  dataset  using  train  and  test  models. 

Clearly,  there  is  a  problem.  The  following  observation  can  be  made: 

1)  the  controllers  appear  to  be  working  fine,  and 

2)  the  reference  models  and  the  test  models  are  providing  inconsistent 
knowledge  about  the  process. 


Avg  CO  Above  Max 


■  Train  Models 


■  Test  Models 


Baseline 


123 


123 


Steady-State  !  Steady-State 
MPC  MIC 


97 


48 


105 


Dynamic  MPC 


114 


Dynamic 
MRAC 


106 


Figure  32:  Average  CO  above  max  over  testing  dataset  using  train  and  test  models. 
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Chapter  8  addresses  these  issues  in  more  detail,  but  for  now  the  quantification  of  the 
current  controllers  continues. 

7.2  Online  Quantification 

Online  quantification  will  be  restricted  to  measuring  the  controllers  ability  to  affect  the 

steady-state  performance  of  the  unit.  Quantifying  online  performance  will  require  running 
online  experiments,  where  MVs  are  moved  and  the  resulting  CVs  measured.  Each  experi- 
ment will  have  to  be  of  relatively  short  duration,  since  the  longer  an  experiment  takes  the 
less  likely  it  is  that  steady-state  conditions  will  be  maintained.  In  order  to  compare  the  per- 
formance of  different  controllers,  which  will  invariable  have  to  perform  their  actions 
under  different  conditions,  baseline  conditions  will  be  established  prior  to  each  experi- 
ment. These  baseline  conditions  will  be  stated  in  terms  of  MVs  since  we  have  no  direct 
control  over  DVs.  To  account  for  differences  between  the  DVs  between  individual  experi- 
ments, the  MVs  will  be  returned  to  the  baseline  conditions  prior  to  each  experiment. 

Each  online  experiment  will  follow  the  following  protocol:  The  unit  operator  is  asked 
to  bring  the  unit  to  steady- state  conditions,  i.e.,  holding  all  MV  setpoints  constant.  After 
the  unit  has  reached  steady-state,  measurements  are  taken  to  establish  baseline  conditions, 

my'^^^  and  c\"^^^ .  The  controller  is  then  queried  for  MV  setpoints,  mv*(t) ,  which  will 
be  applied  by  the  unit  operator. 

New  setpoints  will  be  repeatedly  queried  and  applied  until  the  unit  has  once  again 
returned  to  steady-state.  The  frequency  with  which  an  operator  carries  out  this  process, 
will  be  at  their  discretion.  If  the  operator  feels  that  individual  MV  setpoints  can  not  be 
made,  then  these  setpoints  are  constrained  and  the  controller  is  queried  for  another  set  of 
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MVs.  When  none  of  the  MV  setpoints  can  be  made,  or  the  controller  has  no  new  advice, 
i.e.,  it  has  saturated  against  its  constraints,  then  the  unit  is  at  steady-state  and  the  experi- 
ment is  terminated. 

•  ■  — ^exp 

The  CVs  are  once  again  measured  to  establish  experiment  conditions,  cv  .  The  dif- 
ference between  the  experiment  and  baseline  conditions  will  be  called  the  experiment 

delta,  A^*^ .  The  operators  are  then  asked  to  return  the  MV  setpoints  back  to  their  baseline 

cv 

conditions  mv^"^^ ,  and  a  third  measurement  is  taken  to  estabhsh  validation  conditions 
cv    .  The  difference  between  the  experiment  and  validation  conditions  will  be  called  the 
validation  delta,  A*^' .  Notice  that  if  the  validation  conditions  match  the  baseline  condi- 

cv 

tions,  then  the  experiment  and  validation  deltas  will  have  identical  magnitude  and  sign. 

Controllers  are  then  compared  based  on  their  ability  to  affect  the  CVs  relative  to  these 
baseline  values  across  multiple  experiments.  Key  to  the  accuracy  of  these  experiments 
will  be  how  these  measurements  are  reported.  The  next  sections  outline  the  measurement 
methodology  applied  during  offline  quantification. 

7.2.1  Measurement  Methodology 

One  of  the  most  critical  aspect  to  quantifying  the  affect  that  different  control  strategies 

have  on  a  real  process,  will  be  in  determining  the  significance  of  our  results  relative  to 
process  noise  and  changing  steady-state  conditions.  The  following  outlines  the  statistical 
measurement  methodology  followed  in  this  study. 
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—^base  .  . 

Each  measurement,  e.g.  cv  ,  taken  of  the  units  steady-state  condition  for  a  vanable 
at  time    ,  e.g.  cv(t^) ,  will  implement  the  following: 

I:    Collect  r  observations,  {cv(t)}i  =  t 


II:   Calculate  the  sample  mean 


=  ?  I  ^^^^ 


III:  Calculate  the  sample  standard  deviation 


^base 


1     '"^^  ^  2 

1-      (cv(0  -  p.'')  (94) 


r- 1 


Each  delta  calculated,  e.g.  a!^^  ,  of  changes  to  the  units  steady-state  condition  will 

cv 

implement  the  following: 

—^base  —^val 

I:    Given  measurements  for  cv      and  cv 

II:   Calculate  the  lower  bound  of  a  ( 1  -  a)  100%  confidence  interval  for 
the  difference  between  two  sample  means  according  to 


7.2.2  Results 

Given  the  results  obtained  in  Section  7.1  "Offline  Quantification,"  there  seems  little 
hope  of  conducting  successful  onHne  experiments.  While  online  tests  might  be  useful  in 
confirming  the  offline  results,  there  are  considerable  costs  associated  with  conducting 
them.  In  addition  to  the  time  and  resource  requirements,  there  is  the  invaluable  capital  of 
buy-in  from  operations  and  engineering  to  be  considered.  Most  of  these  operators  and 
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engineers  have  spent  5  to  20  years  learning  how  to  drive  and  maintain  the  unit.  They  are 
the  experts.  Bringing  in  a  new  technology  that  is  going  to  show  them  how  to  better  operate 
their  unit,  needs  to  be  managed  carefully.  The  operators  and  engineers  have  seen  many 
technologies  come  and  go  and  have  never  seen  a  technology  able  to  model  unit  emissions 
much  less  control  them. 

With  all  of  this  said,  10  experiments  were  conducted  with  the  steady-state  optimizer. 
This  was  the  first  controller  implemented,  and  these  experiments  were  conducted  before 
all  of  the  above  results  we  analyzed.  The  results  were  about  as  encouraging  as  they  were  in 
our  offline  analysis,  and  are  presented  in  Figures  33  and  34. 
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Figure  33:  Change  in  NOx  for  steady-state  controller  experiments. 


Figures  33  shows  the  measured  NOx  change  between  the  baseline  and  experiment, 
along  with  the  corresponding  change  between  the  experiment  and  validation  regions.  We 
can  see  that  there  seems  to  be  a  slight  decrease  in  NOx.  The  average  NOx  percent  change 
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for  all  experiments  was  4.8%.  The  changes  were  so  small  with  respect  to  the  steady-state 
NOx  variance,  however,  that  few  of  the  experiments  proved  significant. 

Figures  34  shows  the  corresponding  CO  changes.  Here,  its  hard  to  see  any  trend.  The 
average  CO  percent  change  for  all  experiments  was  2.7%.  Once  again,  these  changes  are 
so  small  that  few  of  the  experiments  proved  significant. 
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Figure  34:  Final  CO  level  for  steady-state  controller  experiments. 


CHAPTER  8 
PARAMETERIZATION  PROBLEM 

The  work  thus  far  has  resulted  in  some  unexpected  results.  Applying  accepted  neural 
network  modeling  techniques  has  produced  several  models  for  a  process  that  contain  dras- 
tically different  "knowledge"  about  the  process.  Here,  knowledge  refers  to  the  cause-and- 
effect  relationships  between  MVs  and  C Vs,  which  is  what  each  of  the  control  designs 
depend  upon.  Contrast  this  with  our  modeling  results  from  Section  6.8  "Analysis,"  where 
these  same  models  demonstrated  consistent  and  robust  "knowledge"  of  the  process,  where 
knowledge  was  considered  to  be  the  models  ability  to  predict  or  forecast  a  blind  dataset. 
Also  notice,  that  this  same  blind  test  dataset  was  used  to  perform  the  offline  quantification 
of  the  controllers. 

This  problem  is  analogous  to  the  parameterization  problem  in  classical  adaptive  con- 
trol theory.  An  adaptive  control  system  centers  around  the  idea  that  a  process  is  described 
as  a  mathematical  function  with  parameters.  It  might  therefore  be  expected  that  the  way 
parameters  are  estimated  is  essential  to  the  success  of  an  adaptive  controller.  It  is  useful  to 
view  parameter  estimation  in  the  broader  context  of  system  identification.  The  key  ele- 
ments of  system  identification  are  the  selection  of  model  architecture,  experiment  design, 
parameter  estimation,  and  validation.  Since  system  identification  is  executed  automati- 
cally in  adaptive  systems,  it  is  essential  to  have  a  good  understanding  of  all  aspects  of  the 
problem.  The  elements  of  system  identification  are  known  to  be  fundamental  issues  in 
adaptive  control  theory. 
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Even  though  this  is  a  problem  from  classical  controls,  there  are  no  classical  solutions 
which  apply  to  neurocontrol  designs.  The  primary  reason  for  this  is  the  use  of  neural  net- 
works as  process  models.  Neural  networks  are  fundamentally  different  than  linear  or  first- 
principles-based  process  models  in  that  they  are  nonparametric.  The  individual  parameters 
have  no  physical  interpretation,  i.e.,  they  are  meaningless  coefficients  in  a  black  box.  We 
are  therefore  seeking  a  method  to  validate  the  parameterization  of  a  nonparametric  system. 
In  classical  adaptive  control  theory,  parameterization  is  a  design-time  issue  that  is  typi- 
cally dealt  with  analytically.  We  can  expect  that  the  solution  for  neurocontrol  theory  will, 
like  most  other  aspects  of  neural-network-based  system  design,  have  to  be  dealt  with 
empirically. 

8.1  Search  for  a  Validation  Metric 

Let's  begin  our  search  for  a  methodology  to  validate  the  parameterization  of  neural 

network  models  by  reviewing  the  metrics  used  to  assert  the  performance  of  these  models. 
Figure  35  presents  three  of  these  metrics  for  the  steady-state  CV  MLP  model.  The  primary 
metric  of  model  quality  used  in  this  study  has  been  correlation.  The  correlation  between 
the  actual  unit  NOx  emission  and  the  MLP's  prediction  of  NOx  was  0.801.  This  metric 
indicates  that  the  MLP  model  understands  a  considerable  amount  about  the  variation  in 
NOx  production.  Furthermore,  notice  that  the  MLP  produced  this  prediction  by  observing 
the  MVs,  DVs  and  SVs  only,  implying  that  the  MLP  understands  a  considerable  amount 
about  the  relationship  between  these  variables  and  NOx  formation. 

The  normalized  mean-squared  error  (NMSE)  was  also  used  to  asses  the  worthiness  of 
each  model.  Recall  that  a  NMSE  greater  or  equal  to  1  indicates  that  the  model  is  perform- 
ing no  better  than  a  trivial  prediction,  which  simply  predicts  the  statistical  mean.  Here  the 
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MLP  predicted  the  blind  test  dataset  with  a  NMSE  of  0.298,  indicating  that  the  model  is 
doing  considerably  better  than  simply  predicting  the  mean. 


ssCVModel  MLP 


Figure  35:  Summary  of  validation  metrics  for  MLP  CV  model. 


The  correlation  and  NMSE  results  provided  in  Figure  35  are  for  the  model  with  the 
lowest  cross-validation  MSE  from  a  set  of  10  models  trained  from  different  random  initial 
conditions.  Also  recall  that  the  offline  quantification  of  the  controllers  was  performed 
using  the  models  with  the  second  lowest  cross-validation  MSE.  Furthermore,  remember 
that  we  determined  that  the  controllers  preformed  very  well  when  their  reference  models 
were  used  to  quantify  their  performance.  We  are  therefore  looking  for  a  validation  metric 
which  would  provide  some  insight  into  the  difference  between  these  two  models.  The  cor- 
relation and  NMSE  results  presented  in  Figure  35  cannot  differentiate  between  these  two 
models,  because  they  relate  to  only  one  of  these  models. 

It  is  clear  that  the  validation  metric  that  we  are  seeking  should  provide  an  indication 
about  how  a  particular  model  compares  with  the  ensemble  of  possible  models  which  could 
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be  developed  for  the  process.  For  nonlinear  models  trained  with  a  gradient  descent  learn- 
ing algorithm,  this  must  include  some  understanding  about  the  structure  of  the  perfor- 
mance surface.  Figure  36  presents  the  correlation  and  NMSE  over  the  blind  test  dataset  for 
each  of  the  10  training  runs  of  the  steady-state  CV  MLP  model.  Recall  that  each  training 
run  starts  from  random  initial  conditions.  With  the  exception  of  two  training  runs,  which 
appear  to  have  been  trapped  in  local  minima,  there  is  relatively  consistent  performance 
across  the  10  sample  models  from  the  ensemble  of  possible  models.  Furthermore,  there  is 
virtually  no  difference  between  the  best  model  and  the  second  best  model.  Clearly,  the 
correlation  and  NMSE  metrics  are  not  going  to  provide  insight  into  why  two  parameter- 
izations  for  the  same  controller  provide  significantly  different  results. 


Figure  36:  NMSE  and  R  for  all  10  training  results  for  MLP  CV  model. 


In  addition  to  correlation  and  NMSE,  Figure  35  presented  a  metric  labeled  "std  err/ 
var."  The  realization  that  we  are  looking  for  a  metric  capable  of  differentiating  between  a 
single  model  and  the  ensemble  of  possible  models,  leads  naturally  to  considering  standard 


152 

errors.  Recall  that  Tibshirani  [61]  proposed  the  "bootstrap"  method  for  estimating  the 
standard  errors  of  a  MLP's  predictions.  Having  estimates  for  the  standard  errors  will  pro- 
vide insight  into  variability  in  the  predictions  between  models  drawn  from  the  ensemble. 
The  "std  err/var"  column  in  Figure  35  presents  the  average  standard  error  calculated  using 
the  10  model  trainings  and  the  "bootstrap"  method  across  the  test  dataset,  divided  by  the 
measured  NOx  variance  across  this  dataset.  The  reason  for  dividing  by  the  variance  is  to 
provide  an  intuitive  feel  for  the  reported  value  of  this  metric.  A  standard  error  equal  to  this 
variance  indicates  almost  no  confidence  in  the  model  predictions,  while  a  standard  error 
significantly  smaller  than  the  variance  indicates  confidence  that  the  models  understand 
more  that  the  natural  variation  of  the  variable.  The  "std  err/var"  metric  value  of  0.358 
reported  in  Figure  35  fails  to  differentiate  our  MLP  models. 

The  problem  with  all  of  these  metrics  is  that  they  assess  the  models  ability  to  predict 
future  outputs  of  the  process.  The  models  are  able  to  make  accurate  predictions,  because 
although  the  test  dataset  is  blind  it  must  contain  similar  process  relationships,  cause-and- 
effect,  to  the  training  dataset.  When  our  control  algorithm  is  run  across  this  dataset,  it  is 
changing  these  relationships.  There  are  three  likely  mechanisms  by  which  this  could  hap- 
pen: 

1)  The  MVs  and  DVs  are  not  independent.  When  the  controller  moves  a 
MV  it  assumes  that  the  other  MVs  and  DVs  remain  constant. 

2)  The  SV  models  are  not  accurately  modeling  the  impact  of  MV  moves. 

3)  The  MVs  are  highly  correlated.  Thus  changing  a  MV  in  the  test  dataset 
breaks  our  assumption  that  the  correlation  relationships  within  the  train- 
ing and  test  datasets  are  the  same. 

It  is  unlikely  that  case  1)  is  the  cause  of  our  troubles.  For  if  the  MVs  and  DVs  are  not 
independent,  then  there  is  no  way  for  any  of  the  models  to  have  determined  this.  Given 
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that  all  of  the  models  have  been  developed  from  the  same  MV  and  DV  variable  sets.  If  in 
fact  these  variables  are  not  independent,  then  the  impact  of  this  on  our  control  designs 
would  be  the  same  as  if  these  variables  are  correlated  which  will  be  considered  in  case  3). 

If  there  is  a  problem  with  the  SV  models,  this  would  definitely  have  an  impact  on  the 
results  of  our  controllers.  When  examining  the  validation  metric  for  the  steady-state  CV 
model  above,  we  were  providing  it  with  the  actual  SVs  as  input  rather  than  the  predicted 
SVs  from  the  SV  model.  Recall  that  the  later  configuration  is  how  the  controller  utilizes 
these  models.  We  have  seen,  however,  that  the  correlation  and  NMSE  metrics  for  the  SV 
models  demonstrate  that  these  models  are  able  to  predict  the  SVs  over  the  blind  test 
dataset.  To  verify  that  this  is  not  the  cause  of  our  problems.  Figure  37  presents  the  results 
for  the  combined  SV/CV  configuration  for  the  same  metrics  considered  in  Figure  35.  Here 
although  using  the  SV/CV  combined  model  did  reduce  the  fidelity  of  our  ability  to  predict 
the  CVs,  the  degradation  in  performance  does  not  justify  the  poor  performance  of  our  con- 
troller. 


Figure  37:  Summary  of  validation  metrics  for  combined  SV/CV  model. 
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Case  3)  is  a  valid  cause  for  our  parameterization  problem.  When  modeling  an  indus- 
trial process,  correlated  variables  are  going  to  be  a  fact  of  life.  Unfortunately,  the  available 
tools  in  the  literature  for  dealing  with  such  situations  are  very  limited.  The  following  sec- 
tions will  demonstrate  that  correlation  has  a  tremendous  impact  on  the  design  of  model- 
based  controllers. 

8.2  Correlation  Paradox 

Correlated  variables  are  a  natural  phenomenon,  as  two  variables  are  correlated  when 

they  are  related  through  some  physical  laws  or  process.  In  fact,  the  mission  statement  for 
the  empirical  investigator  is  to  infer,  or  learn,  these  physical  laws  from  observations  of 
process  data.  Hence,  correlation  is  a  double-edged  sword.  Without  it  learning  is  not  possi- 
ble as  there  is  nothing  to  infer,  while  unanticipated  correlation  makes  it  difficult,  if  not 
impossible,  to  understand  what  has  been  learned  or  inferred. 

Industrial  process  plants,  through  centralized  control,  systematically  correlate  their 
process  variables.  The  plant's  distributed  control  system  (DCS)  maintains  a  large  number 
(thousands  to  tens  of  thousands)  of  feed-forward  and  feedback  control  loops,  from  a  rela- 
tively small  number  (tens  to  hundreds)  of  operator  setpoints.  The  DCS  is  designed  to  batch 
control  over  as  many  subsystems  as  possible,  from  the  smallest  possible  number  of  opera- 
tor setpoints.  If  it  were  feasible  the  operator  would  only  have  a  single  setpoint  called 
demand.  As  a  rule,  most  variables  within  an  industrial  plant  will  be  highly  correlated  to 
plant  demand. 

The  effects  of  correlation  have  been  well  document  for  modeling  applications  like  sys- 
tem identification  [56]  and  regression  [53].  The  most  significant  attribute  that  these  appli- 
cations have  in  common  is  that  they  deal  with  systems  that  are  either  linear,  or  have  a 
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relatively  simple  parametric  nonlinear  form.  This  feature  allows  an  investigator  to  apply 
analytically  assess  the  significance  of  model  parameters.  In  fact,  the  correlation  effects  are 
automatically  accounted  for  in  significance  testing.  Thus  the  validation  required  to  assess 
the  parameterization  of  a  model  is  accompHshed  through  significance  testing. 

To  illustrate  the  effect  of  correlation,  consider  the  physical  processing  plant  that  pro- 
duces output  V  from  inputs  m,  and  U2  ■  An  investigator  building  a  model  for  this  plant  is 

presented  with  data  for  y,  m  j  and  Uj ,  without  any  information  about  the  physical  system. 
The  investigator  hypothesizes  that  the  data  was  generated  according  the  regression  model 


where  P,  and  P2  are  unknown  coefficients,  and  r|  is  a  zero-mean  uncorrelated  distur- 
bance term.  Applying  the  method  of  least  squares  (LS),  Ramanathan  [53]  shows  that  the 
corresponding  normal  equations  are  given  by 


y  =  p,M,  +  p2"2  +  'n' 


(96) 


(97) 


"l"2  +  PlX"2 


(98) 


with  solutions 


Pi 


I]>^"lZ'^2-E>^"2X"l"2 
Z"lZ"2-(Z"l"2)^ 


(99) 


(100) 
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and  variances 

2 

Far(P,)  =   f   (101) 

2 

Var(^2)  =   T   (1^2) 

I«2(l-0 

Cov(P„       =         ^"""'^^  ,  (103) 

2 

where  a   =  Var(r\)  and  r  is  the  correlation  coefficient  between  Mj  and 

Suppose  Mj  and  U2  are  highly  correlated,  r  is  near  ±1 .  It  is  evident  from  equations 

(101)  and  (102)  that  the  variances,  and  hence  the  standard  errors  of     and  P2  will  be 

2 

very  large  when  r  is  close  to  1 .  A  large  variance  means  poor  precision  and  a  low  student 
t-statistic,  which  results  in  insignificance.  In  addition,  we  can  see  from  equation  (103)  that 
the  covariance  between  the  regression  coefficients  will  be  very  large,  in  absolute  value.  If 
the  estimates  are  correlated,  each  coefficient  is  capturing  part  of  the  effect  of  the  other 
variable  and  hence  it  is  difficult  to  obtain  the  separate  effects  of  u ,  and  U2  ony.  In  other 

words,  we  cannot  hold  U2  constant  and  increase  m,  alone,  because  U2  being  correlated 

with  M] ,  will  also  change  as  a  result.  Or  vise- versa,  since  correlation  is  not  cause. 

Ramanathan  [53]  offers  the  following  properties  of  models  derived  from  correlated 
input  variables: 

1)  If  two  or  more  explanatory  variables  in  the  multiple  input  (MI)  model 
are  exactly  linearly  related,  then  the  model  cannot  be  estimated. 
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2)  If  some  explanatory  variables  are  nearly  linearly  related,  then  OLS  and 
MLE  and  hence  are  unbiased,  efficient,  and  consistent. 

3)  The  effect  of  correlation  among  input  variables  is  to  increase  the  stan- 
dard errors  of  the  parameters  and  reduce  the  t-statistics,  thus  making 
these  parameters  less  significant  (and  possibly  even  insignificant).  The 
tests  of  hypotheses  are,  however,  valid. 

4)  The  covariance  between  the  parameters  of  a  pair  of  highly  correlated 
variables  will  be  very  high,  in  absolute  value,  thus  making  it  difficult  to 
interpret  individual  coefficients. 

5)  Correlation  may  not  affect  the  forecasting  performance  of  a  model  and 
may  possibly  even  improve  it. 

8.2.1  Effects  of  Correlation  on  Neural  Network  Modeling 

The  neural  network  literature  is  not  without  reference  to  the  issues  of  correlated  inputs, 

but  the  number  of  references  are  disproportionately  low.  These  issues  have  not  received 
the  attention  within  the  neural  network  community  that  they  have  in  related  modeling 
communities  like  system  identification.  The  easy  answer  to  why  is  that  neural  networks 
are  highly  nonlinear  structures  making  investigations  into  relevant  statistics  difficuU  if  not 
impossible.  A  more  thorough  understanding,  however,  lies  in  the  way  in  which  neural  net- 
works have  been  applied  to  date.  The  author  offers  the  following  observations: 

1)  Descent  based  learning  will  always  arrive  at  a  solutions,  regardless  of 
the  degree  of  correlation  within  the  input  space.  The  solution,  however, 
is  rarely  unique  nor  globally  optimal. 

2)  The  vast  majority  of  applications  for  neural  networks,  rely  purely  on 
their  ability  to  forecast.  For  this  reason,  standard  errors  for  a  networks 
predictions  has  been  a  recent  topic  of  research. 

3)  If  the  correlation  relationships  in  the  input  space  of  all  testing  datasets 
are  identical  to  those  in  the  training  dataset,  then  the  correlated  inputs 
will  not  degrade  the  network's  predictions.  This  situation,  which  will 
often  be  the  case  for  prediction  applications,  is  unlikely  in  control  appli- 
cations since  the  controller  will  independently  move  these  inputs. 
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4)  The  network  coefficients,  its  weights,  have  no  parametric  interpretation. 
Their  has  been  Httle  reason,  even  if  we  knew  how  to,  to  calculate  their 
standard  errors  and  significance. 

Because  of  the  way  in  which  neural  networks  have  been  applied,  the  vast  majority  of 
work  done  on  the  issue  of  correlated  inputs  deals  almost  entirely  with  its  effect  on  the 
dynamics  of  learning.  Correlation  in  the  input  space  drastically  reduces  the  rate  of  conver- 
gence for  descent-based  learning  algorithms  [22].  To  counter  this  effect,  the  most  com- 
mon neural  network  solution  to  correlation  is  to  projection  the  input  spaces  to  a  lower 
dimensional  subspace  with  orthogonal  bases.  Most  of  these  projection  operators  are  linear 
and  based  on  the  energy  or  information  content  of  the  variables.  The  most  common  such 
projection  is  principle  component  analysis.  The  result  of  using  such  a  projection  is  to  pre- 
cede the  neural  network  with  a  simple  matrix  multiply.  Thus  the  actual  inputs  to  the  neural 
network  are  completely,  or  nearly  completely  uncorrelated  features  and  thus  accelerate  the 
convergence  of  the  learning  algorithm. 

Although  very  useful,  these  techniques  do  not  provide  a  solution  to  the  parameteriza- 
tion problem.  These  techniques  can  be  used  to  improve  the  consistency  of  the  neural  net- 
work portion  of  the  model,  but  do  not  improve  the  consistency  of  the  combined  models 
containing  the  transform  stage  with  the  neural  network.  This  point  may  be  confusing  now, 
but  the  following  sections  should  help  to  clarify  it. 

In  summary,  neural  networks  have  found  an  applications  niche  where  this  robust  pre- 
dictor has  demonstrated  the  ability  to  out  forecast  more  traditional  methods.  One  of  the 
problems  with  moving  neural  networks  from  an  academic  interest  to  an  accepted  model- 
ing methodology  is  the  lack  of  standardized  reporting.  The  vast  majority  of  neural  network 
applications  to  date  apply  the  inferences  of  a  model  without  knowledge  of  their  statistical 
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significance.  Recent  work  in  standard  errors  for  these  predictions  have  taken  an  important 
step  towards  solving  part  of  this  problem  [27]. 

8.2.2  Effects  of  Multicollinearity  on  Model-Based  Control 

Each  of  the  controllers  presented  in  Chapter  4  belong  to  the  general  family  of  model- 
based  control,  i.e.,  they  each  explicitly  use  a  reference  model  during  offline  training  or 
online  control.  The  process  knowledge  provided  by  the  model  in  each  of  these  control 
designs  is 

(104) 

dmv 

where  the  vectors  cv  and  mv  are  the  CVs  and  MVs,  respectively.  Recall  our  example  in 
Section  8.2  "Correlation  Paradox,"  where  the  model  was  given  by 

=  P,M,+P2"2  +  T1.  (105) 
If  this  model  was  used  as  a  reference  model  for  a  model-based  control  scheme, 

cv  =  {y}  and /wv  =  {M|,M2}  could  be  considered  to  be  the  CVs  and  MVs,  respec- 
tively. Hence  the  process  knowledge  provided  by  our  model  is  the  set  { P , ,     } .  But  as  we 
have  seen  when  the  MVs  2<,  and  ih  are  highly  correlated,  the  standard  errors  for  P,  and 
P2  will  be  very  large.  A  situation  that  will  quickly  render  our  process  knowledge  insignif- 
icant. Clearly,  this  will  have  a  tremendous  impact  on  the  performance  and  robustness  of 
our  controller. 
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8.3  Validation  Metric 

Neural  networks  do  not  have  the  same  convenient  interpretation  for  their  coefficients. 

They  do,  however,  infer  relationships  between  the  MVs  and  CVs,  which  is  used  directly 
by  the  controllers.  The  method  of  extracting  this  relationship  from  a  neural  network 

model,  is  commonly  referred  to  as  sensitivity  analysis.  The  MV  sensitivities  dcv/dmv  can 
be  calculated  directly  using  backpropagation.  These  sensitivities  will  be  a  function  of  the 

unit's  operating  state  {mv,dv}.  Recall  that  we  are  trying  to  get  a  feel  for  how  these  sensi- 
tivities vary  across  the  ensemble  of  possible  models.  Figure  38  illustrates  the  MV  sensitiv- 
ities for  a  typical  operating  state  across  the  10  steady-state  MLP  models,  representing  a 
sampling  from  this  ensemble. 


-0.400 


Figure  38:  Sensitivity  results  for  all  10  training  results  for  NOx  CV  model. 

For  the  first  time,  we  can  clearly  see  our  parameterization  problem!  Each  model  is  pro- 
viding significantly  different  process  knowledge  to  the  controller.  More  explicitly,  our 
best  and  second-best  models  with  respect  to  NMSE,  R  and  SE,  represented  by  training 
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runs  number  3  and  7,  look  as  if  they  are  modeling  two  completely  different  processes. 
Alas  there's  hope. 

Figure  38  provides  the  average  sensitivity  reported  from  the  ensemble  of  models, 
along  with  the  corresponding  95%  confidence  intervals.  This  result  demonstrates  that  we 
are  not  able  to  assert  the  directional  sensitivity  with  any  degree  of  confidence.  It  is  easy  to 
see  why  the  controllers  had  so  much  trouble. 


0.200 
0  150 
0.100 
0.050 
0.000 
-0.050 
-0.100 
-0.150 
-0.200 


1/3  OFA 


2/3  OFA 


FD  Fan 
Inlet  A 


FD  Fan 
Inlet  B 


Sensitivity     0.053       -0.025       0.031       -0.071       -0.014       -0.018       -0.040  0.092 


GR  Fan 
Inlet  A 


GR  Fan 
Inlet  B 


GR  Fan 
Hppr  A 


GR  Fan 
Hppr  B 


Figure  39:  NOx  CV  model  sensitivity  with  95%  confidence  intervals. 


8.4  Revised  Representation  Pruning  Algorithm 

Recall  that  the  goal  of  the  representation  pruning  algorithm  is  to  identify  the  best  vari- 
able representations  within  each  group.  In  Chapter  6,  the  "best"  variable  representation 
was  determined  by  selecting  the  representation  which  produced  a  model  with  the  highest 
correlation  on  a  blind  test  dataset.  In  hindsight,  it  is  clear  that  the  definition  of  "best"  must 
be  application-specific.  If  the  models  were  being  applied  to  a  forecasting  application,  then 
the  representation  pruning  algorithm  would  have  suited  our  purposes.  For  control  applica- 
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tions,  however,  it  is  clear  that  the  definition  of  "best"  will  have  to  be  modified.  Note  that 
the  robustness  of  the  predictors  produced  by  this  algorithm  should  still  be  in  doubt.  If  the 
forecasts  are  always  made  for  a  dataset  with  identical  correlations  characteristics  to  the 
training  dataset,  then  the  model's  forecast  would  be  fine.  If,  however,  the  correlation  is 
less  physical,  then  there  is  still  cause  for  concern. 

8.4.1  Input  Sensitivity  Standard  Errors 

Recall  the  "bootstrap"  approach  to  calculating  the  prediction  standard  errors  for  a  neu- 
ral network  presented  in  Section  2.2.4.0.1  "Bootstrap  methods."  The  validation  metric  to 
differentiate  models  to  be  used  for  our  model-based  control  designs  will  similarly  need  to 
evaluate  the  standard  error  of  the  MV  sensitivities.  The  following  algorithm  has  been 
developed  to  approximate  the  MV  standard  errors  using  a  "bootstrap"  methodology: 

I:    Generate      datasets,  each  one  of  size       drawn  with  replacement 


fi-om       training  observations  { 6^,  dh } 


b=  1 


ArgMin^^{J{db-KK^b))} 


(106) 


III 


Estimate  the  standard  error  of  the  i  th  input  sensitivity  as 


CT 


(107) 


where 


(108) 
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Conceptually,  this  algorithm  simply  calculates  the  variance  of  the  input  sensitivities 
across  multiple  models,  all  trained  using  different  random  initial  conditions  and  indepen- 
dently sampled  "bootstrap"  datasets.  Clearly,  the  variance  of  the  sensitivity  calculations 
presented  in  Figure  38  will  be  high.  If  an  algorithm  can  be  found  that  is  capable  of  reduc- 
ing these  standard  errors,  the  offline  controller  quantification  results  should  improve. 

8.4.2  Algorithm 

The  revised  representation  pruning  algorithm  uses  the  standard  error  estimates  for  the 

model's  input  sensitivity  across  a  dataset,  as  presented  in  Section  8.4.1  "Input  Sensitivity 

Standard  Errors,"  to  determine  "best"  as  follows: 

I:    Perform  the  group  selection  pruning  algorithm  to  the  Essential  Tag 
List  in  the  appendix 

II:   Select  the  first  MV  group  in  the  reduced  Essential  Tag  List 

III:  If  there  is  only  one  unique  "representation"  in  the  group  then  select  the 
next  MV  group  and  goto  step  III 

IV:  For  each  unique  representation  in  the  selected  group,  train  a 

MLP(15,5)  30  times  with  the  variables  for  this  representation  as  the 
only  input  variables  fi-om  the  selected  group;  the  30  models  resuhing 
from  each  training  will  be  called  "representation"  model  set 

V:   Calculate  the  standard  error  for  the  input  sensitivities  for  each  MV, 


CT   ^1 ,  across  each  representation  model  set 


VI: 


Calculate  the  average  normalized  input  sensitivity  standard  error  for 
each  "representation"  model  set  according  to 


(109) 


VII:  Select  the  best  "representation"  model  set  as  the  set  with  the  lowest 
input  sensitivity  normalized  average  error,  and  remove  all  variables 
associated  with  every  "representation"  model  set  except  for  the  best 
"representation"  model  set 

VIII:  Select  the  next  group  and  goto  step  III 
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Figure  40:  Results  of  revised  representation  pruning  algorithm. 
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The  results  after  running  the  revised  pruning  algorithm  for  each  of  the  4  groups  are 
illustrated  in  Figure  40.  Notice  that  unlike  the  initial  pruning  algorithm,  this  algorithm  sig- 
nificantly degraded  the  fidelity  of  the  model  as  it  pruned  each  representation  group.  The 
variable  selections  after  the  revised  representation  pruning  algorithms  are  provided  in 
Table  14. 
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Table  14:  Final  variable  selections  after  revised  pruning. 


Manipulated  Variables 

Disturbance  Variables 

State  Variables 

Control  Variables 

1/3  OFA  Damper  Bias 

Ambient  Air  Press 

Sec  Air  Temp  Side  A 

CEM  NOx 

2/3  OFA  Damper  Bias 

Ambient  Air  Temp 

Sec  Air  Temp  Side  B 

FD  Fan  Bias 

Bnr  Atm  Stm  Press 

CEM  CO 

02  Trim 

Bnr  Atm  Stm  Tem p 

Generated  M  W 

GR  Fan  2A  Inlet  Dmpr  Bias 

GEM  Barometric  Pressure 

W  indbox  Pressure 

GR  Fan  2B  Inlet  Dmpr  Bias 

Cond  Back  Pres  -  Side  A 

GR  Fan  HpprDmprA  Bias 

Cond  Back  Pres  -  Side  B 

GR  Fan  HpprDmprB  Bias 

Fuel  Gas  Flow  Indication 

Fuel  Oil  Flow  Indication 

Fuel  Tem p  Fired 

Furnace  Pressure 

8.4.3  Variable  Representation 

The  results  of  the  revised  representation  pruning  algorithm  are  very  interesting,  and 

provide  some  lessons  about  variable  representation  choices.  For  each  MV  group,  the 
revised  algorithm  chose  the  operator  bias,  rather  than  the  positions  selected  by  the  initial 
algorithm.  Recall  the  example  given  in  Section  6.2.1.5  "Variable  representation,"  where  5 
tags  were  identified  that  represented  the  single  process  variable  of  gross  airflow.  As  we 
have  seen,  this  many-to-one  mapping  between  tags  and  variables  is  very  common  in 
industrial  plants.  The  tags  selected  for  variable  representation  should  have  the  following 
characteristics: 

•  Uncorrelated:  The  tag  or  tags  chosen  to  represent  the  variable  should  be  uncorre- 
cted with  each  other  and  with  tags  chosen  to  represent  the  other  input  variables. 

•  Representative:  The  tags  chosen  should  provide  a  complete  representation  of  the 
dynamics  of  the  variable,  i.e.,  they  should  be  as  correlated  as  possible  to  the  pro- 
cess variable. 


•  Dynamic:  The  tags  chosen  should  contain  as  much  of  the  real  variation  in  the 
process  variable  as  possible.  If  the  variable  is  changing  for  any  reason,  even  if 
the  variation  if  not  intended  or  not  desirable,  this  information  should  be  repre- 
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sented  in  the  chosen  tags. 

From  the  discussion  above,  it  is  easy  to  see  that  all  5  tags,  except  FD  fan  trim,  will  be 
highly  correlated  to  plant  demand.  As  noted  previously,  most  tags  will  be  correlated  with 
plant  demand.  This  would  make  these  four  tags  a  poor  choice  for  representing  FD  fan  air- 
flow. FD  fan  trim,  on  the  other  hand,  is  not  correlated  to  demand.  The  trim  is  a  bias  tag 
that  only  moves  when  the  operator  wishes  to  alter  the  fuel-air  mixture,  represented  by  FD 
fan  demand,  that  was  designed  into  the  DCS.  The  bias  is  also  appealing  from  an  optimiza- 
tion perspective,  since  it  provides  a  way  to  tune  control  over  the  process  without  having  to 
control  the  process. 

Turning  our  attention  to  the  second  and  third  criteria  for  variable  representation  above. 
The  FD  fan  trim  is  certainly  correlated  to  gross  airflow,  since  moving  it  will  alter  the  FD 
fan  setpoint.  Dynamic,  however,  the  FD  fan  trim  is  not.  The  only  time  the  trim  moves  is 
when  the  operator  touches  it.  For  the  unit  considered  in  this  study,  the  trim  was  rarely 
touched^ .  In  addition,  the  trim  does  not  contain  any  of  the  natural  or  unintended  variability 
in  airflow,  e.g.  the  slack  in  the  PID  controller.  This  variability  is  real,  in  the  sense  that  air- 
flow actually  changed,  and  has  an  impact  on  the  combustion  process.  Although  not 
intended,  this  variation  provides  rich  data  for  learning. 

So  we  can  either  choose  a  representation  rich  with  dynamic  information  about  the  pro- 
cess but  highly  correlated  to  demand,  or  a  representation  completely  uncorrelated  to 
demand  that  is  rarely  moved.  Pruning  the  variable  representations  based  on  the  fidelity  of 
the  resulting  model  will  select  tags  rich  in  dynamic  information  regardless  of  correlation: 

1 .  This  is  an  interesting  fact,  since  FD  fan  trim  turned  out  to  be  one  of  the  signifi- 
cant levers  over  optimizing  NOx. 
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pruning  based  on  standard  errors  of  the  knowledge  extracted  about  the  cause-and-effect 
relationships  will  prioritize  independent  tags.  The  answer  to  this  dilemma  identified  in  this 
work  requires  two  solutions:  1)  with  sufficient  parametric  testing,  structured  movement  of 
the  MVs,  the  models  will  be  able  to  extract  enough  knowledge  about  the  process  for  the 
controller  to  function  properly;  and  2)  when  the  controllers  are  connected  to  the  plant,  they 
will  move  the  MVs  and  model  retuning  will  continue  to  extract  more  knowledge  about  the 
process. 

Figure  41  illustrates  the  MV  sensitivities  for  the  same  operating  state  illustrated  in  Fig- 
ure 38,  across  the  first  10  of  the  30  revised  steady-state  MLP  models. 
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Figure  41 :  Sensitivity  results  for  all  10  training  results  for  revised  NOx  CV  model. 


There  is  a  visible  difference  in  the  significance,  i.e.,  our  confidence,  in  these  models  as 
compared  to  original  models.  These  models  have  extracted  consistent  process  knowledge 
with  respect  to  the  relationships  between  the  MVs  and  CVs.  A  comparison  between  the 
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standard  error  calculations  between  these  models  and  the  initial  models  is  presented  in 
Figure  42.  This  result  clearly  validates  our  observations  about  Figure  41 . 
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Figure  42:  Revised  NOx  CV  model  sensitivity  with  95%  confidence  intervals 

8.5  Modeling 

Figure  43  presents  the  results  of  training  each  of  the  best  model  architectures  from 
Chapter  6  with  the  new  variable  definitions  presented  in  Table  14.  Here,  each  model  archi- 
tecture was  trained  10  times,  the  training  result  with  the  lowest  cross-validation  error  was 
selected,  and  the  selected  result  was  tested  against  the  blind  test  set.  Notice  that  this  pro- 
cess is  equivalent  to  the  training  process  for  the  previous  models. 

Comparing  Figure  43  with  Figure  28,  the  new  models  have  lost  some  fidelity  with 
respect  to  their  ability  to  predict  a  blind  test  dataset.  The  degradation  in  fidelity,  however, 
is  quite  small  when  compared  with  the  increase  in  confidence  with  respect  to  the  MV  sen- 
sitivities that  the  new  models  have  (Figure  42). 
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Figure  43:  Best  revised  models  for  all  model  definitions  by  architecture 

8.6  Control  Implementation 

The  assumption  is  that  models  with  higher  confidence  in  the  MV  sensifivifies  will  pro- 
vide better  reference  models  for  our  model-based  control  designs.  Each  control  design  will 
now  be  implemented  with  these  new  models  and  their  performance  re-quantified. 


8.6.1  Offline  Quantification 

The  offline  performance  for  the  new  controller  will  follow  the  same  methodology  pre- 
sented in  Section  7.1  "Offline  Quantification."  Once  again,  the  dynamic  process  models 
chosen  for  offline  quantification  will  not  be  the  reference  models  used  to  implement  any 
of  the  control  designs,  hi  particular,  the  offline  quantificafion  will  use  the  second  best 
model,  with  respect  to  cross-validafion  MSE,  as  the  dynamic  process  models. 

Once  again,  offline  quantification  will  apply  each  controller  to  the  dynamic  process 
models  across  the  test  dataset  and  calculate  the  average  NOx  reduction  along  with  the 
amount  of  CO  production  above  the  maximum  CO  constraint.  The  NOx  and  CO  results 
are  presented  in  Figures  44  and  44,  respectively. 
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Figure  44:  Average  NOx  reduction  over  testing  dataset  using  old  and  revised  models. 
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Figure  45:  Average  CO  above  max  over  testing  dataset  using  old  and  revised  models. 


Recall  that  it  was  at  this  point  in  developing  the  controllers  the  first  time  that  the  prob- 
lem first  surfaced.  Clearly,  the  new  controllers  are  performing  much  better  than  the  origi- 
nal ones,  whose  results  have  been  included  in  Figures  44  and  45  for  reference. 
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8.6.2  Online  Quantiflcation 

The  offline  quantification  first  identified  a  problem  with  the  control  designs,  and  now 

indicates  that  it  has  been  solved.  The  proof,  however,  is  always  in  the  pudding,  or  at  least 
in  the  response  of  the  actual  plant.  It  stands  to  reason,  that  if  we  were  not  able  to  develop 
models  with  consistent  knowledge  of  the  process,  then  there  is  little  hope  that  the  resulting 
controllers  would  be  able  to  improve  the  process.  Consistent  models  of  the  process,  how- 
ever, do  not  necessarily  imply  that  the  resulting  controllers  will  be  able  to  improve  the 
process. 

To  quantify  the  ability  of  the  controllers  to  improve  the  process,  the  methodology  out- 
lined in  Section  7.2  "Online  Quantification"  is  applied.  Once  again,  10  experiments  were 
conducted  with  the  steady-state  optimizer.  The  results  this  time,  however,  are  very  encour- 
aging. 

Figure  46  shows  the  measured  NOx  change  between  the  baseline  and  experiment, 
along  with  the  corresponding  change  between  the  experiment  and  validation  regions. 
There  is  a  significant  decrease  in  NOx.  The  average  NOx  percent  change  for  all  experi- 
ments is  -24.83%. 

Figure  47  shows  the  corresponding  CO  changes.  Here,  the  controller  was  able  to  have 
a  significant  impact  on  CO.  The  average  CO  for  both  baseline  and  experiment  is  approxi- 
mately 500ppm,  but  the  variance  of  the  baseline  data  is  121.576ppm  while  the  experiment 
standard  deviation  is  13.473. 
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Figure  46:  Change  in  NOx  for  revised  steady-state  controller  experiments. 
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Figure  47:  Final  CO  level  for  revised  steady-state  controller  experiments. 

This  offline  quantification  procedure  was  repeated  for  each  of  the  four  control  designs, 
and  the  NOx  and  CO  results  are  summarized  in  Figures  48  and  49.  Included  in  this  figures 
are  the  estimated  results  that  each  controller  thought  it  would  achieve  across  the  same 


dataset,  this  was  accomplished  by  using  a  simulator  for  the  plant  as  described  in  Section 
7.1  "Offline  Quantification." 


Figure  48:  Average  percent  NOx  reduction  for  10  online  experiments. 


There  are  some  interesting  observations  about  these  results: 

1)  The  performance  of  the  MIC  controller  was  substantially  poorer  than 
the  other  control  designs  in  both  offline  and  online  analysis. 

2)  There  online  performance  of  the  steady-state  optimizer  was  better  than 
either  of  the  dynamic  controllers. 

3)  The  dynamic  controllers  performed  significantly  better  in  offline  analy- 
sis, that  they  did  for  online  analysis. 

The  poor  performance  of  the  MIC  controller  can  be  explained  by  its  treatment  of  con- 
straints. Recall  that  the  MV  constraints  were  treated  as  penalty  functions  for  the  online 
optimizer,  and  that  the  CO  maximum  constraint  was  addressed  by  placing  CO  in  the  CV 
set  and  fixing  it  to  SOOppm.  This  leaves  only  a  single  degree  of  freedom  for  the  online 
optimizer,  which  is  the  single  CV  of  NOx.  The  problem  is  that  when  the  online  optimizer 
starts  moving  NOx  to  meet  its  objective  function,  it  will  have  to  stop  as  soon  as  one  of  the 


174 

MV  constraints  exercise  their  penalty  functions.  There  are  no  other  degree  of  freedom  for 
the  optimizer  to  explore,  other  than  to  simply  stop  moving  NOx. 

Notice  that  although  penalty  function  are  also  employed  in  the  steady-state  optimizer, 

this  problem  does  not  exist.  Here,  the  online  optimizer  has  N"'"  degrees  of  freedom,  the 
number  of  MVs,  so  that  when  a  MV  constraint's  penalty  function  is  exercised  it  can  sim- 
ply stop  moving  that  MV.  Furthermore,  when  a  SV  or  CV  constraint's  penalty  function  is 
exercised,  the  optimizer  can  explore  other  combinations  of  MVs  to  maintain  this  con- 
straint while  still  trying  to  lower  NOx. 
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Figure  49:  Average  percent  CO  reduction  above  500ppm  for  10  online  experiments. 


The  observation  that  the  steady-state  optimizer  out-performed  the  dynamic  controllers 
is  most  likely  a  function  of  the  online  test  procedure.  Recall  that  the  controllers  were 
deployed  online  in  an  advisory  mode.  Here,  the  controllers  optimal  MV  setpoints  are  for- 
warded to  an  operator,  who  is  then  responsible  for  manipulating  the  actual  plant  setpoints. 
One  has  to  remember  that  the  operators  are  responsible  for  the  unit,  which  is  worth  several 
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hundred  million  dollars.  Operators  operate  the  unit  based  on  years  of  experience  and 
dogma.  Sorting  out  which  is  which  is  not  an  easy  task,  and  invariably  the  operator  will  and 
should  make  the  final  call  about  whether  the  requested  setpoints  are  safe.  The  result  was 
that  MV  setpoints  were  not  made  on  regular  intervals,  and  what  the  optimizer  recom- 
mended as  simultaneous  MV  moves  might  have  been  implemented  one  at  a  time  over  a 
period  as  long  as  20  minutes.  Furthermore,  recall  that  the  measurement  methodology,  pre- 
sented in  Section  7.2.1  "Measurement  Methodology,"  required  operating  the  unit  under 
steady-state  conditions.  Clearly,  any  dynamic  advantage  that  these  controllers  might  have 
had  was  probably  lost. 

The  observation  that  "the  dynamic  controllers  performed  significantly  better  in  offline 
analysis,  that  they  did  for  online  analysis"  can  also  be  chalked  up  to  the  online  test  proce- 
dure. In  fact,  this  result  reinforces  this  conclusion.  The  fact  that  the  offline  analysis  dem- 
onstrated a  significant  benefit  to  using  dynamic  controllers,  provides  incentive  to  continue 
work  toward  a  closed-loop  configuration. 


CHAPTER  9 
CONCLUSION 

This  project  developed  four  neurocontrollers  for  the  complex  industrial  process  of 
NOx  formation.  All  the  these  neurocontrollers  demonstrated  benefit  in  an  applications 
area  where  traditional  control  designs  have  proven  ineffective  [39].  The  first  conclusion  of 
this  study  is  that  neurocontrollers  are  able  to  deal  with  highly  complex  industrial  process 
applications  where  more  traditional  methods  have  not  been  successful. 

The  objectives  of  this  study  went  beyond  demonstrating  that  a  neurocontroUer  could 
be  developed,  however.  This  work  demonstrated  neurocontrol  designs  that 

1)  are  straightforward  to  implement, 

2)  account  for  dependent  internal  process  states,  and 

3)  can  deal  with  correlated  process  variables. 

9.1  Contributions 
9.1.1  Application-Based  Neurocontrol  Implementation  Methodology 

A  neurocontrol  implementation  methodology  was  developed  whereby  a  process  engi- 
neer with  reasonable  knowledge  about  the  process  variables  can  develop  advanced  neuro- 
controllers. The  process  engineer  was  able  to  simply  classify  process  variables  as  MVs, 
DVs,  SVs  and  CVs,  and  the  methodology  was  able  to  automate  the  development  of  each 
stage  of  the  controller.  A  set  of  supporting  algorithms  was  proposed,  implemented  and 
validated. 
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9.1.2  State-Space  Neurocontrol  Designs 

Several  neurocontrol  designs  were  implemented  ranging  from  purely  input/output  to 

full  state-space.  Consistent  with  the  literature  on  neural  network  modeling  [31],  the  full 
state-space  architecture  was  difficult  to  train  and  was  thus  not  the  best  performer.  The  best 
performer,  however,  was  a  partial  state-space  model.  This  work  demonstrated  that  avail- 
able process  knowledge  can  be  used  to  create  partial  state-space  models  which  can  signif- 
icantly improve  the  overall  controller  performance. 

9.1.3  Methods  for  Dealing  with  Correlation 

The  primary  limitations  of  neurocontrol  identified  during  this  study  are  directly  related 

to  problems  that  arise  when  modeling  with  correlated  input  variables.  In  fact,  the  over- 
whelming conclusion  of  this  work  is  that  correlation  in  the  input  space  is  the  single  most 
important  factor  when  designing  a  inductive  model-based  control  system.  Decisions  over 
static  vs.  dynamic  modeling,  single-stage  vs.  multi-stage  optimization,  model  topology, 
controller  type  and  every  other  seemingly  monumental  decision  will  prove  irrelevant  if 
correlation  issues  are  not  properly  addressed. 

A  new  metric  and  methodology  for  variable  selection  were  proposed,  developed  and 
validated  as  a  viable  solution  to  correlation  issues  for  industrial  applications.  This  solu- 
tions is  a  data  mining  approach  for  applications  where  many  representations  are  available 
of  the  same  underlying  process  variables.  This  approach  will  not  work  for  applications 
where  multiple  choices  for  a  process  variable  are  not  available,  and  the  author  proposes 
using  this  new  metric  as  an  objective  for  the  learning  rule  as  an  extension  to  this  work. 
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9.1.4  Accurate  Combustion  Models 

Neural  networks  have  found  an  applications  niche,  where  this  robust  predictor  has 

demonstrated  the  ability  to  out  forecast  more  traditional  methods.  One  of  the  problems 
with  moving  neural  networks  from  an  academic  interest  to  an  accepted  modeling  method- 
ology is  the  lack  of  standardized  reporting.  The  vast  majority  of  neural  network  applica- 
tions to  date,  apply  the  inferences  of  a  model  without  knowledge  of  their  statistical 
significance.  The  lack  of  standardized  reporting  has  not  clouded  the  success  of  neural  net- 
work because  most  of  these  applications  rely  solely  on  the  model's  ability  to  forecast, 
without  regard  for  what  relationships  the  model  has  inferred  from  the  underlying  process. 

As  clearly  demonstrated  in  this  work,  the  same  cannot  be  said  for  neurocontrol,  how- 
ever. The  results  presented  here  raise  significant  questions  about  the  metrics  being  used  to 
evaluate  model  performance  in  the  field  of  neural  networks.  Metrics  which  evaluate  the 
predictions  made  by  a  model  do  not  imply  that  the  model  has  learning  the  fundamental 
cause-and-effect  relationships  within  the  process.  A  new  metric  was  proposed  and  demon- 
strated to  overcome  this  limitation. 

9.1.5  Novel  Combustion  Controller 

An  online  NOx  optimizer  was  developed  and  evaluated.  This  optimizer  demonsfrated 

a  consistent  45%  reduction  in  the  overall  NOx  emissions  from  the  power  plant. 

9.2  Afterword 

The  benefits  demonstrated  in  this  study  at  the  Canal  Electric  generating  station  have 
been  maintained  for  more  than  two  years.  Since  these  initial  results,  neurocontrollers  have 
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been  deployed  on  12  generating  units  throughout  the  United  States.  In  addition  to  being 
installed  on  more  units,  the  neurocontroUers  have  been  extended  with 

1 )  closed-loop  implementations, 

2)  a  software  program  that  automates  the  neurocontrol  design  methodolo- 
gies, 

3)  online  automation  of  the  variable  and  architecture  pruning  methodolo- 
gies using  evolutionary  computing  techniques,  and 

9.3  Future  Direction 

The  author  offers  three  important  directions  for  extending  this  work: 

1)  development  of  learning  algorithms  that  explicitly  account  for  MV  sen- 
sitivity standard  errors, 

2)  algorithms  to  support  automatic  determination  of  variable  type,  and 

3)  online  adaptation  of  the  model-reference  adaptive  controller  using  reen- 
forcement  learning  strategies. 
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